8D i«« 2n 



OOCOSSIt BBSO0B 



ea 006 606 



AUTHOB 

mSTITOTZOH 

fiEPOl^T NO 
POB DATE 
BOTE . 



the innum 



EOfiS pa ICE 
DESCfilPTORS 



IDENTIFIERS 



ABSTRACT 



Brennah, Robert !• v\ 
Generalizability Analyses: Principles., and Procedures* 
ACT Technical Builetin No«^26; - \\ 
American Coll* Testing Program, lona 
Research and DeVelopient Diw. 
ACT-tB-26 - \ 
Sep 77 , - \ 

'SSp*; Revision of la paper presented at 
Heeting of the Aae^ican Educational Research 
Association (61st, Meir York, Nev York^UApril 4-8, 
'1977)^ \ I 

illF-$0*83 HC-$a*67 ^lus Postage • i 
I'Analysis of Variance; "^Hatheiatical Hor|elels: ^ ^ 
leasureaent; ^Heasurenent Techniques; Orthogonal 
dotation; Probleis;| ♦Beliability; Research 
Hiethodology ; Saapling; Standard Error of Heasurenent; 
♦Statistical Analysis; Test Interpretat^pn; True. 
SwreSj \ 
♦Generalizability Theory 



Rulies, procedures, and algorithas intend^jd to aid 
researchers and practitioners iti the J application of gene:^alizability 
theory to a broad ,rknge of aeasureaent probleas are presejnted* Tvo 
exanples of jieasareotent research are iG studies, vhich exapine the 
dependability of soaW general Beasttrejaent procedure; and ib studies, 
vhich provide ^e data for substantivp decisions* Ha^M: eaphasis is 
given to the estiaation of G study variance components, and to~ the « 
estxaation and use ofi D study varianc^ components for different 
objects of aeasureaenrt and different universes of generalization* D 
studies in vhich the universe of genei^alization contains Micets that 
are either fixed or essentially infinilte are discussed, ail veil as D 
studies that involve, ^anpling froa a finite universe* A notational 
systea. is introduced' to facilitate the discussion; and each rule, 
procedure or algorithal is illustrated using designs that i^nvolve 
varying types and degrees of coaplexity* (Author/HV) 



♦ Obcuaents acguired by ERIC include aany ir.foraal unptiblished . ♦ 



♦ Materials not available froa other sources* ERIC aakes ^'very effort 

♦ to obtain the best copy available^ Nevertheless, iteas qf marginal 



reproducibility are often encountered and this affects 4ke quality 
of the microfiche and hardcopy reproductions ERIC makes available 



via the ERIC Document Reproduction Service (ED6S) * EDRS 
responsible for the quality of the original document* IRi 
supplied by EDRS are the best that can be made from the 



ERJC 



♦ 
♦ 
♦ 



is not 

productions ^ 
original* ♦ 



i 



ACT TECHNICAL BULLET I, N NO - 26 



U S 0EPARTMENT0F14EALTN. 
EDUCATION 4 WELFAH^ 
NATIONAL INSTITUTE OF ' 
EDUCATION « 

THIS OOCUMENT MAS BEEN REPRQ. 
OUCED eXACTLY AS RECEIVED FROM 
THE PERSON OR 0RCANI2ATI0N'0R(GIM^ 
ATING IT POINTSOF VIEWOR OPINIONS 
ST/VTED 00 NOT NECESSARILY REPRE- 
SENT OFFICIAL NATIONAL INSTITUTE OF 
EOUCATION POSITION OR POLICY ^ 



I 



GeneraJLizability Analyses: 
Principles and Procedures 
by 

Robert L . Br en nan. 



- -PERMISSION TO REPRODUCE THIS 
lyiATERIAL HAS BEEN GRANTED' BY-* 

T^Tt^^^^bu^A^VIg^^Alr^^eSoURCES 
INFORMATION CENTER (ERIC) AND 
USERS OF THE ERIC SYSTEM." 



The American College Testing Program 



CO 



The Research and Development Divi.sion 
The American College Testing Program 
P. O.. Box 168, Iowa City, Iowa 52240 
September, 1977 



FRir 



I,, 



Generalizability 



f ' Abstract 

This paper presents "rules," procedures, and alqorithms intended to aid 
researchers and practitioners in the application of generalizability theory . 
to a broad range of measurement problems. Major eanphasis is wiven to the 
estimation of G study vai^iance components, and to^the estimation and use of 
D study variance components for different objects of measurement and different 
universes- of generalization. Consideration is given to D studies in which 
the universe of generalization contains facets that are eirher fixed or 
essentially infinite, as well as D studies that involve sampling from a finite 
universe. A notational system is introduced to fac-ilitate .the discussion; and 
each "rule," procedure, or algorithm is illustjrated using designs that involve 
varying types and degrees of complexity. 



Table of Contents . . 

Glossary of Symbols • 

Introduction . . , ^ . 

. Background aind Terminology 
Overview ' 
A Notational System and Analysis of Variance * ' 
Considerations for G Studies 

, Notation for ANOVA Designs 
Main Effects and Interaction Effects 
^ Structural Models 
. . Siams of Squares 

G Study Considerations and the . Estimation of Va riance 

Components for the Random Effects Model 

'variance Components — Notation and Terminology 

Estimation of Variance Components for Random 
Effects Models - ' ' / 

Expected' Mean Squares 



D Studies for Random , Fixed and Mixed Models 
D Study Variance Components 
D Study sCnranary Statistics 
. Combining D Study Variance Components 
Illustrative D Studies 
Sampling from Finite Universes ^ 
G Study Considerations 
D Study Considerations 
Comments and Conclusions 
Reference Notes 
Reference^ 



Glossary of Symbols - 

Definition 
*!Crossed with." 

"Nested within." 



A facet; or a specific condition 
of a facet. 



A set of conditions for a fa^et; or 
, the sample mean for a set of. conditiohs 
for a facet. - ■ * 



Generic symbols for ^ny component or 
source, of variance in a G or D study. 



Facet that serves as object of 
measurement for some D study; 



Score effect for the component a. 
Mean score, for the component a. 



Observed score analogue of 



Observed score analogue of y^. 



Random error. 



Expected value. 



-^^andom effects variance #component' for 
a ('given sampling from an infinite 
universe) . 



G* study sample size. .for a*facet. 



Size of universe of admissible obser- 
vations for- a facet in the G study.' 



Symbol , ' . 

n* 

MS(a)\ 
■EMS (g) 

02(T) 
o2(A) . 

•n . . - 

If 

a2(e) 

« 

V 
F 

• R ■ . 

f(a) 

» 

;d(alY) 



Definition 
D study sample size for a facet. 

Size of universe of generalization 
for a facet in the D study. * ' 

Mean square for a* ^ , 

Expected mean square for a* 

Universe score variance for a D sturdy 

Variance of differences between"" ob- 
served scores and universe scores. 



* Variance of differences between 
oiiserved deviation scores and 
universe scores expressed in 
deviation form* 

Variance of differences between 
universe scores^ and regression * 
estimates of universe scores* 

Expected observed score variance. 

Genera lizability coefficient. 

Main ''effect index in y, ^ 

Set of facets that are fixed in a * 
D study. 



Set of facets tihat are randomly- 
sampled in a D study, 

• «• 

' See Equation 4, 



See"Equation 13, 



General izabi 1 ity, 

^2 . 



Introduction' 



Classical test theorv provides a very siiiiple sti*uctural model of the 
relationship between observed, true, and error scores* However, the simplicity 
Of the model necessitates some rather restrictive assumptions if the model is 
tp^^be applied to real data, Generalizability theory liberalizes and extends 
classical test theory in several important respects. For example, the thearv 
of generalizability does' no.t necessitate the classical test theorj' assumption 
of "parallel" tests? rather, generalizability theory employs the weaker assumotion 
of "randomly parallel" tests. Also, classical test theory atssumes that. errors 
of measurement are sampled from an undifferentiated univariate distribution. 
By contrast, generalizability theory allows for the existence pf multiple types 
and sources of error' through the application of analysis of variance procedures, 
or, more specifically, through the application of the general linear model to the 
dependability of measurement. Consequently, generalizability theory is ap-- 
rateable to a Inroad range of" testing and evaluation studies that, arise in 
^ducatkon and psychology, • * 

Background and . T-erminoloqy ^ ^ ' . 

.The ba<;ic theoretical foundation for generalizability theory can be found 
in papers by Cronbach/ Rajaratnam, and Gleser (1963) »and Gleser, Cronbach, 
and Rajaratnam (1965), These papers were followed by an extensive explication 
of generalizability theory in a monograph by Cronbach, Gleser, Nanda, and 
Rajaratnam (1972) entitled, The De pendability of Behavioral Measurements . 



Genei^izability^ \ 



However, the use of analysis of variance approaches to reliability issues did , 
not ^'jain with the publications of Cronbach and his colleagues, even though it 
♦ is they who haVe most clearly and comp-letely formulated reliability issues in 
analysis of variance terms.' Over 35 years ago Burt (1936), Hoyt (1941), and 




Jackson and Ferguson^l941) , discussed-^rftfcj^is of variance approaches to 

if 

reliability. Subsequent contributions were made by Alexander (1947), Ebel (1951), 

- \ p 

Finlayson (1951), Loveland (1952), end Burt (1955). Also, Lindquist (1953), 

«• . . ^ 

in thfe l&st chapter of his experimental design- text,, discussed in considerable 

, - • ' ■ ... ^ 

detail the estimation of variance components in reliability studies. In fact, 

»> 

in several respects work bv Burt and Lindquist appears to anticipate the 
development of 'generalizability theory. Additional evidence of the role of 
analysis of variance in reliability issues can be seen in the work of 
Webster (1960) and Medley and Mitzel (1963) not long before, the original 
publication by Cronbach et al* 01963) of the theory of generalizability. 
Although generalizability theory borrows itsv^ta^^s^tTical models and 
. research designs ^from analysis of variance, there are some changed in emphasis, 
terminology, and interpretation. For example, in*" analysis of variance, the y 
magnitudes of variance components sometimes receive direct attention (see, » 
for example, Vaughan & Corballis, 1969), but the ultimate ^goal is usually a 
test (or tests) of statistical significance. In generalizability theory 
interest >is focused primarily on the magnitude of variance components and, 
to some extent, generalizability coefficients. Tests of statistical significance 
receive less direct emphasis. 



8 



> ^ Generalizability 



In generalizability theory, any observat^ion on^sorne object of measurement 
.(e.g.? school, class, student) is assumed to be. sampled from a universe of 
observations. While universe and population are logically equivalent terms, here 
the word population is reserved for the object of measurement', and the word 
universe is reserved for* the conditions under which observations are made. 
Any observations from the universe can be characterized by the conditions under 
which the observation is made. The sev, of all possible conditions bf a particular 
kind is aalled a facet , ^ 

Generalizability theory also emphasizes the distinction between G studies, 
which. examine the dependability of some general measurement procedure, and 
studies, which provide the data for substantive decisions (Rajaratnam, 1960). 
"For" example,, the published estimates .of reliability for a college aptitude 
test are based on a G study. College nerson ,el . of f icers employ these estimates 
to judge the accuracy of data thoy collect on their own applicants (D study) " 
(Cronbach et'al., 1972, o. 16). The primary purpose of the G study is to 
estimate comoonenfS of variance, which may then be used in a variety of D 
Studies* The G study and the D study may be the same study, or they may be 
different studies using the same design. ' Generally, however, G studies are 
most useful when they employ complex designs and lar^^e sample sizes to.^ provide 



* * * • * Generalizability ^ 

stable estimates of as many veuriance components as possible. 

*' Based on the difference between a G study and a D study , Cronbach et al* 
{1972') make a 'further distijiction between the^^uhiverse of admissible observa- 

*tions and.the universe of generalization. ^ 

The test developer or other investigator who carried out a G study 
*:akes certain facets into consideration c^nd, with respect to each - 
facet, considers a certain range of conditions. The observations^ 
encompassed by the possible cohibinations of conditions that t^e G 
study represents is call^ the universe of admissible observations . 
We may also speak ot the universe of admissible conditions of a 
certain facGt. A decision maker, applying essentially the same 
measuring technique, proposes to generalize to some universe of 
conditions all of whiczh he se^s as eliciting samples of the same 



information. We refer to that as the universe of generalization . 
The G study* can serve this decision maker only if its universe of 
admissible conditions is identical to or includes the proposed — 
universe of generalization. Different decision makers may propose 
different universes of generalization. A G study that defines 
the universe of admissible observations broadly, encompassing all 

the likely universes of, generalization, will be useful to various 

t 

decision makers. (p. 20) 
Overview . 

In this paper "rules," procedures, and algorithms are presented that 
involve a notational system, analysis of variance considerations, G studies, 
and D stu^Jies. ' In addition, all "rules procedures, and algorithms are 
illustrated using designs that involve varying levels and types of complexity. 

• . ■ - 10 ^ 



' * GeneralizabiliLy *^ 

<* 

The ndtational system used here differs, in some respects, from that used* 
in the Cronbach et al. (1972) monograph on general izability theory. The 
primary difference is that the notational system for variance components used 
in this paper does not necessitate specifically reporting the effects that 
are confounded in a design that involves nesting. Nevertheless, this, nota- 
*tional system does imolicitly "carrv the meaning" of a nested component. . Iii 
most other respects, the notation used by Cronbach and his colleagues has been 
maintained or minimally altered. % 

The terminology used in some analysis of variance literature is not always 
the same as the terminology employed by Cronbach and his colleagues in discussing 
generalizability theory. For example, the word "facet", in generalizability 
theory has approximately the same connotation as "main effect" in much of the 
analysis of varia'nce literature. Also, the word "component" in Cronbach et al. 
(1972) is basically synonymous with the word "effect" in some analysis of 

variance literature. One the purposes of this paper is to h^lp practitioners 

I. . •« 

familiar with analysis of variance literature to understand and apply general- 

izability theory. Therefore, some terminological compromises are made here. 

Generally, the terminology employed is that of Cronbach et al. (1972); but 

exceptions do occur, especially in initial sections that primarily treat 

analysis of variance considerations ► ^hen terminological ambiguities arise 

an attempt is made to resolve them, or at least clarify them. * 

The ma for portion of this paper is devoted to a consideration of "rules," 

♦ 

procedares*, and algorithms for performing G studies and-D studies. Particular 

emphasis is /given to the estim^^tion of G study variance components, and to the 

* I ' 

estimation use ojE D study variance components for different objects of 

f ' * 

measurement and different universes of generalization ► Most of the discussion 



\ 

Generalizadjility , 
7 

i 

♦ 

treats p studies in which the universe of generalization contains facets that . 
are either fixed or essentially infinite • However, consideration is also given 
to D studies that involve sampling from a finite universe of generalization • 

There are some restrictions placed, upon the treatynent of generalizability 

««. 

analysis in this paper. In particular, with minor exceptions, only orthogonal 

« * 

analysis of variance designs are considered; i^e,, designs that do not involve 

missing data and/or unequal size subgroups* Also, all designs and stuiSies 

involve only one dependent variable? i,e,,> this paper treats univariate gener- 
. " * # 

alizability theory, as opposed to multivariate generalizability theory (see 
Cronbach et al., 1972, Chapter 10). Finally, the "rules," algorithms, and 

"if 

procedures are not intended to cover, in depth or breadth, the extensive treat- 
merit of generalizability theory provided by Cronbach and his colleagues. Rather, 
this paper is intended to provide researchers and practitioners with a set 
" of procedures to facilitate the appli'cation of generalizability theory to ^ 
a broad range of measurement problems. 

r 



12 



A Notational System and Analysis of 
Variar.ee Considerations for G Studies 



The first steps in performing a G study involve the usual initial proce- 
dures for an ahalysis of variance; namely, defining the model and determining 
sums of squares, degrees of freedom, and mean squares for eabh of the effects 
dn the G study design. These issues are usually treated in Experimental design 
texts'^ the context, of specific designs. Here, rules an^'^^lgori thins are pr6-; 
vided.that are applicable to a large class of orthogonal , or balanced, designs. 

Noti^ion for ANOVA Designs ^ ^ , ' , . 

Using the sYmbols^^"x5^ to mean /'crossed" with" and to mean "nested within," 

,^nosJt common analysis of variance designs can be repr^esented by a suitable sequence 

,"of .^ffec^ indices and' Symbols^/. In this paper, fiye different designs will be used 

-jfor illustrative purposes: p^ x i,, .£X i x o , £ X (i:£) / (£•£) 5C i, apd 

C£:£) X (i.:£:t)- tjn Crontiach et al.^ (1972), p x i,x o is called Design VII, 

£x (Us) is.pesign y-A,.arid (p:c) x i. is essentially Design Vr-B.] The indices 

in these designs c,an be interpreted as referring to a person (p) , a class (£) , 

an item, U) , a .siabtegt (s) , an occasion (o ) , and a test ( t^) . For example, 

(£:c) x~i caii be interpreted as meaning that persons are nested' within classes; 

and -both persons and classes are crossed with^items. The use of these specific 

identifying words' for .each index is maintained throughout this paper; however, 

it is the mature of the design that is under consideration — not the names 

'"^associated with the indices. . ^ ^ , 

^< ^ ^ \ ' - . • ; • . - ' 

These designs have been chosen for two reasons. First, they involve dif- 

ferent types and degrees, of complexities in applying the "rules," and procedures 

which will be presented. Second, these designs are typical of the kinds of 



Generalizability 

. • . ■ ■ 9 

designs that do occur in testing and evaluation studies • Most of the classical 
results from test theory come from a consideration of the basic design for' 
persons crossed with items, £ x i^. The design £ x i^ x jo, which Cronbacli et'al. 
(1972) treat in great detail, is a simple extension of this basic design. In 
many realistic situations, however, some degree of nesting is present. For 
example, it is very common for items to be nested within subtests, as in the 
design £ x (i.: s^) * Also, in many testing studies, persons are nested within 
classes, as in the design (2.'£) 3C _i. Finally, an extensive testing study may 
involve considerable nesting, as in the design (p_:c) x t^) * * 

Main Effects arid Interaction Effects 

Figure 1 provides a Venn diagram representation for each, of the illustrative ^ 

designs. In these Venn diagrams, the. mean square ^ for a main effect is represented 

by a circle (of any size) , and ,the mean square for an interaction effect is rep- 

resented by the intersection of two or more circles, (Tho words "effect" and 
-\ ■ ^ 

"component" are basically synonymous terms; however, we will u^e ths term "effect 
here because the phrases "main component^* and "interaction component" are rare ir 

ANOVA literature,) . :~- 

» Insert Figure 1 about here 




^ Notation for Main Effects, A main effect can be represented by 

first 7(. « r second 
nesting > : < nesting ^ 
index (es)J index (es) 

If the main ef fecb is not a nested main effect, then it can be represented by 
the main effect index, only. 

For example, in the design £ x i^, the main, effect for persons i? denoted 
2./ and the main effect for items is denoted i. In the design (j^:c) x i^, the 



• . \ 

General izability 

. 10 ' \ 



(nested) main. effect for persons is £:c, where the main effect index is £, and 
the nesting index is c. Similarly, in the design (£:c) x (i:s:t) , the (nested). ... 
main effect for items is i:s:t, the (nested) main effect for sxibtests is s:t, 
«id' the main effect for tests is t. In general-, the "number of main effects is 
' equal to the number of indices in the symbolic representation of a design. * 

In some monographs and textbooks, main effects are called treatments, factors 
or facets. However, not all effects are easily interpretable *as treatments, and 
the word "factor" is apt to cause confusion with factor analysis. Here the terms 
"main effect" and "facet" are used synonymously, . unless otherwise noted. " 

Notation . for Interaction Effects . Each . interaction* effect can be repre- 
sented as a' combination of maia effects in the following manner: ". 

t Combination of^ (combination of / (^Combihatipn of- / ' . 

J Main Effect > : •< First Nesting > : J Second Nesting . > : ... 
1 Indexes j Indexes J ^ Indexes J 

subpect to the constraint that no index fnay appear more than once in any inter- 
action effect* 



Insert Tables 1-5 abo)it here 



Tables 1 - 5 list the main effects and interaction effects for each of 
the five illustrative designs using the notation defined above. Consider, for 
example, the design (£:£) x (i:s:t) in Table 5. The interaction of c and t 
is simply ct. The interaction of c and s:t is- i£:t; that is, combinations of 
cs ^ are nested within t (see Figure 1). Similarly, the interaction of ptc and 
i:s:t is Ei:cs:t; that is, combinations of £i are' nested within combinations 



. ' • Generalizability ' 

. 11 

of cs^/ which, in turn, are nested within t^. Also, note that the interactions 
of p:£ and c would be but this possibility is ruled out by two occurrences 

of the index £• * . • . 

Nested Effects and Confounding . Cronbach et al. .(1972). usually use a 

sequence of confounded effects to identify any main effect or interaction 

' =^ \ *' . » ^ — "~ — " — 

effect that invbl\^es nesting. For example, if data for the design (£:£) x i 
were analyzed as if the design were the completely crossed design .£ x £ x i, then 
the effects would be p , £, i^, pc , pi , ac£, pqi; but some of these effects would 
be confounded* In f)articular, the main effect £:£, ^in the design (£:£) x i, 
represents the confounding of two of the effects, p and pc, from the design 

• ■ ■ ' . . r • ' ■ . • . 

j^x c_ X 1. Similarly,, the interaction effect £i_:£ represents the confounding 
Ei pic . . • • / . 

WKenever a design involves nesting, there is scHue degree of confounding. 
In designs with more than one nested main effect,., on more ibhan one level of 
nesting, the representation of a nested effect by its confounded Effects 
leads to considerable ccUplexity. This is one reason for using the .nesting ^ 
operator in representing effects. Nevertheless, it is frequently useful to 
know which effects are confounded' in a nested effect. . ' . ' ' 

.Using the- notatipn' irttroduced above, for any nested effect, the effects 
that are confounded are all combinations of indices in the effect that include 
the main effect index jor indexes). For, example, -in the design ( p;c) t x ( i.:£: t) , 

the effect £:t represents the confounding of ,£ and st* Similarly, the effect 

^ . . t ' ^ ' 

i^:£:t^ represents the confounding of i./.i£/ itf and ist ? and the effect pi i^c 1 1 . 

represents the confoti-nding of pi, pic , pis , pit , pise , . pict , pist , and pisct . 



. . * Generalizability 

In general, for any nested effect, the number of effects that are con- 
founded is: 



numlJer of nesting 
2 Exp indices 

/ in component 



For example, in the design (£:£) X Ut^: t) # the effect i^: s^: t_ has two nesting 
indices {s^ anS t) and, there fore, '.this effect has [2 Exp (2)], or four con- ' 
founded effect^. Similarly, the effect pi:cs;1: has {2 >Exp (3)J, or eight 
confounded effects. : „ . . - " . 

Degrees, of Freedom . For any effect (main effect or interaction- effect) 
that is not nested, the 'degrees of freedom are the product of the (ii - 1) 's foi? 

the indexes ih the effect, where ii is the G study sample size associated with 

» 

an index. For any nested effec^, the degrees of freedom are: 

( Product of n's I ' ( Product of (n - 1) 's / ■ 
for nesting indexes j . ^ ^ot main effect indexes | 

Degrees of ^reedom for the effects in each of the five illustrative designs 

a^ provided in Tables 1-5. For example, for "the design (£:£) x U:£:t) in 

Table 5, the degrees of freedom for .the main effect stt are - 1) • Also, 

for the main effect i:£:ty the degrees pf freedom, are n^a^Xn^ - 1) r and for the 

interaction effect pi:cs:t, the degrees ,of freedom are nn n^Cjl-. D Jn. '"^^ !)• 



Gen era 1 izabi 1 ity 



Structural Models 

Consider the design (£:£) x i^. For this design, the observed score for 
person o-^in class £ on item i^ can be represented by the structural model: 

X. =ii + p 'b + p'b + p^ p^ + p.'v^+e? (1) 
pitc £:£ £ i £i Ei.*£ ^ , 

where p = aranfl mean in' the universe; 

/ u A = effect for person p in class c; 
.pW = effejct for cl^ss £; 

p.'A^ effect* for item i^; * . ' \ ' . 

u A = effect for interaction o£ cla.ss c and 'item i; *^ 

. " • ci ' • > ' * ' " ^ — ' 

■" — < . ' " 

u ^'effect for interaction of Person p in class c on item^i; and 

pi:c ^ . . ^'^ ^ ^ 

^ £ = random error : ' . 

^(Note that the structural model for eaph of the five illustrative designs is- 
provided in the footnotes to iTables 1-5.) 



Score Effects . Equation 1' provides a ^ecc^jiosition of the observed score 

X in terms of independently, estimable ^eftects which we will call score effects 
» £i:£ ' ' V " • • . - • 

Specifically, we w^ll say that is the sqbre ^effect for the Component a. 
Since the words "effect" and "component" are basically synonymous', one could also 
speak of the score component for the effect a; however, the author generally , 
/ prefers the former '^erbal description because it avoids some verbal ambiguities 
in. subsequent: sections. ^ ' ' , ' ' ^ ^ 



Generalizability 
< 14 • 



, '• The usual assiunptions concerning score effects, such as those represented. . 
in Equation -1, are well documented in the literature and in experimental design 
texts." First, each effect is assumed to be independent of every other 'effect, 
second, in order to make the estimates of the- effects unique, the expected - , 

"^valueof each effect oyer any of its subscripts is set equal to zero. Consider, . 
for example, the effect y^-V in Equation 1, and supp'ose we take a sample of n^ 
classes from a universe of N," classes. The universe of classes is called the 
unive rse df_ admissible observations for the class facet. The second assump- _ 
tign-.implies that the sum of over the universe of .N^ classes is constrained. . 
.to be zero, and the sum of the estimate s of over "the s^ple of n^ classes ^^^^ 
'is. constrained to be zero.. HoweVer., it is not necessarily true that the sum 

i^hf v over the sample of n classes is zero. Finally., note that Equation 1 • . 
.involves no assumptions about the .distributional form of the ^errors. 

■ Mean scores. • , / * Associated with" 6ach score effect is a^ique . 
mean score. For any component a, the mean score- is the expected value ofothe 
• observed score over all indices not .contained in a. Note that for any facet 

(i.e.', index) the expected value is taken over the universe of admissible obser- 
- vations, and the symbol, "i" is used £o define expectation. For example) from 
■ Equation 1: 

■ That is, p^.^ is the expected value of X^^^ over ^11 items in the universe of 
admissible observations. ' ' 

Note, in particular, the distinction between v^^^^ (score effect) and 

. v (mean score) . Notationally. .a score effect always has a tilda Cv.) associated 
p • c * 
with it, and a mean score does. not.' Also, the term "mean score" in this context 

u ^ 19 . /' ■ 



Generalizability 

' 15 

should not be confused with an observed mean score for a sample , or a universe 

scor^ for a particular D-study, both of which are discussed in considerable 

detail later. " / 

Using this notational system it is easy to express any mean score in 

terms of score effect^^, in general) for the component a, 



Sum of score effects 



V = 11 ^ 



for all components that consist y ^ (2) 

solely of indices in a 



For each component in the design (£:c) x i^, Table 6 reports equations for me.an 
scores in terms of score effects. Conversely / score effects can be exp3;essed 
in terms qf-.ihean. scores. 



insert Table 6 about here 




Algorithm 1^: Expressing a Score Effect in Terms of Mean Scores . 
The following algorithm can be used with any design to express a score effect 
as a combination of mean scores. Let a be a component with t nesting indices 
and m main effect indices; then v \, the score effect associated with a, is 
equal to: ^ „ 



Step 0 : li 
* ot 



Step 1: Minus the mean scores for components /that consist of the t^ 
nesting indices and m 1 of the main effect intjices; 

Strep 2: Plus the mean scores for components that consist of the 
t nesting indices and m - 2 of the main effect indices; 



20 



. . • Generalizability 

Step i: Plus (if is even), or Minus (if i^ is odd) the mean scores 

for components that consist., of the t nesting indices and m - i^ 
of the main effect indices; ' 



The algorithm*^ terminates with Step m; that is, with the mean score for the 



component containing ..only the t nesting indices. If there are no nesting 

indices in the component a, then it follows that Step m results in adding or 

** ' " ^ « . . • 

subtracting y. • ; ^ 

Consider, for example, the component £i:c^ in the design (£:£) x i^* This o 
component has a single nesting index, c_, and two main effect indexes £ -i* « 
Step 1 In the algorithm results in subtracting y^^^ and fr6m IJp^.^ # because 
both £:£ and ci contain the nesting index, £, and 2^1 = I'mair effect index*' 
Step 2_ results in adding y to 1:he result of Step 1, because* £ is the cpmponent 
thai; contains the nesting index, c, and 1-1 = 0 main effect indexes* There^ 
fore, ' • \ » ^ 



Eor eacK^-cpmponent in the design (£:£) x i, Table 6 teports equations for^ 
vscore effectr in terms of mean scores. . . i 

• X ■ - ^^^^^ ' • ' 

Sums of Squares^ , , ^ - 

For . e^h^omponent a, tl)e mean score y^ has an observed score analogue, 
whiqh we denote?5(\ . Similarly, y^'^ has' arf observed score analogue X^'u . 

For* example, in the d^ign (p:c) x i, X. is the observed mean score over the 

\ ~ • c . • . 

sample of peioons and' itemg\in cla§s £, and X^'u is the observi^d score effect 



Generalizability .' 
•17 

» 

for class c. The relationship between y and y-A, is identical to the relation- 

~ , a ot • -J. 

ship between X and ^ '^'^ That is, Algorithm 1 and Equation 2 are applicable to 

observed mean scores and observed score effects through replacement of y , y a*, 

ot ot 

and y by X^/ X^'V/ and X, respectively. In this terminology and notational 
system the "sums of squares'* calculated In the performance of an analysis of 
variance are, more correctly, the "sums of squares" for observed score effects. 
There are two well-known, algebraically identical procedures, for detr.ermining . ' 
the stams of squares for observed score effects. The first procedure entails a 

direct application of the observed score effects i See, f or, ■excunple , the last co.lumn 

< , ^ iff * » • " • 

of Table 7. for the design (£5£)^x i.- ' , " ^ , 



Insert Table 7 about here 



This procedure is^ at leas^t conceptually, the simpler of the 'two. However > 
a computationally easier procedure involves using the sums of sc[uares frr 
observed mean scores (to be distinguished from the sums of squares f^.* observed 
score effects). Kirk (1968), among others, uses this second procedure exten- 
sively. In general, the sum of squares for observed mean scores, for the component 
a , is . , i . ' 

[X n\= f (a)E Xf ; ' ' (3) 

where the summation is taken over all indices in u,' and f^(a) is the number of 

observations used to calculate the mean "for any one of the levels of a. 

j 

Specifically, . " * 

1, if a includes all indices in the^ design; and, otherwis'c, 

f,(ot) - product of the G study sample sizes (n's) for -he 

1 



f indices not included in a. 



GeneralizaLility 
18 ^ 



The quantities CX^] for each of the components in the design (£:£> x £ 
are reported in Table 7. Table 7 also provides the sums of squares for observed 

'^t score effects expressed in terms of the quantities [X ]. Note that the above 

r - ^ ' ^ 

* \^ " terminology directly implies that CX^]'^' is the sum of squares for observed 

.score effects r for the component a. Furthermore, Algorithm 1 and Equation 2 are' 

applicable to the quantities [X ] ahd [X 2"^ through replacement of y • y 

a a . a a , 

' and u by CX ] r CX ly, and CX], respectively. 

^ : a 

«* ^ ' . 

^ * From the above development it follows that the sum of squares (for observed 

score effects) associated yith the component a is: . t 

SS (ap = [X^> » or -. <, . ■ (5) ' ■ • 

.SS(a) = f(a)E(X v2 , • • " - ' ■ .. . (SU- 

where f^(a) and^ E have the sane interpretation in Equation 6 that they^have in 
Equation' 3. . . 

^ w. Equations 5 and 6 are applicable to calctllating sums of .siquares §s"feociated 
with any Component, whether or not it is nested. * In .addition, for any nested 
component/ the sum of squares can be obtained by-adding the sums of squares 
for the confounded effects. For example, in the .design (£:£) x i_ (Figure 1 
and Table 4), the coinponent £i^:£ representi the* confounded effects pi and 
pic , which are independently estimable in the design P x c x _i . Therefore, 

to obtain the siams of squares. fdr £i:£r the data can be treated as if they 

' . . 

- 'jf came from the design £ x £ x i^;- and the addition of the sums of squai;es 

associated with £i and pic results iry the sums of squares for £i:£. * This is 

^a very useful procedure ior performing a G-study having nested components^ 
^ ZV* -'^'^y . . ...» 
' especially when available co\nputer pifograms cannot directly accommodate nested 

designs. ^ ' • • 



, , Generalisability^ , 

■19- ' 

G Study Considerations and the Estimation o^f Variance Components 
for tKe Random E ffects Model ^ 

Whereas, classical cinalysis of variance procedures typically emphasize * 

F-tests, generalizability theory emphasizes the estimation of variance compo- ^ 

nents. According to the moot recent edition of. Standards for Educational & 

Psychological Tests (APA, 1974) : the ^'estimation of clearly labe\ed components 

of "score variance is the most informative outcome of a reliability study , both ♦ 

't/* * 
for the test developer wishing to improve the reliability of his instrument ^ ^ 

and for the user desiring to interpret test scores v/ith maximum understanding""^ 

(p. 49) . . * y 

* I 

V. * " • 

... > ^. . ^ ' 

Variance Component s — Notation and Terminology 

The variance component associated* with the component' a is, by definit»ion, the* 

variance of the universe score effect for the component a. Consider, for^ex3unple,. 

the design £ x i^, which can be represented as: ' * 

where u = grand mean in the' un,i verse, 
U 'V/ = effect for person £, 
W ,% ?i effect for item i, 
li > = effect for the interaction of person £ with itim i, and ' . 
^ =, random euror* • ^ . * 

The variance for the comoonent p is denoted (y ^1, which is abbreviated a^fp). " 
This is, cr^ (p) is the variance of m 'V/, over all oersons in the universe k 

* » • * 



• • Generalizabiiity 
20 

(or population) of admissible observations* 

'J 

Similarly, o (£i) is the variance of the component £i; or, more specifically, 
the variance of v^-v, in the universe. Hpwever, o2 (gi) is confounded with 
random error variance. To account for this confounding, Cronbach et al. 
(1972) denote" this variance component o2 (£i,e) , Using the. notational system" 
discussed above, the component that consists of all indices in the design is 
always confounded with random error. Therefore, it is not' necessary to expli - 
itly indicate this confounding in the notation for variance components, and we 
will not do so here. As another example, consider the component .gi:c in the_ „ 
design (£:c) x d (see Equation 1 and Table 4).. Here, the variance of this . 
component is der^oted )[££.£), cronbach et al.« (1972), however, r^pre- . . 
sent this variance component" by (gi ,£ic , e) , which explicfitly indicates both . v' 
■the confounding resulting from the nesting of jpi within c, and the confounding- 
of random error with gi:£. * " " . ' . / 

For the design £ x i (see Equation 7) , the variance of X . over all persohs ' 
and items is: * . . " 

o2(X .) = g*(X. - u)2 \ . • "(8) 

■ . p,i 



E'i 



= a^(£) + a^(i) + a^,(£i) . ' • (9), 



Since the variance components in Equation 9 are non-negative and independent 
none of them can be greater than the maximum value of (X .if, for example, 
items are scored (0,1), then no variance component can be greator than 0 2^ tho 
maximum value of a (X^£) • In effect, each variance component in Equation 9 



Generalizability , ♦ 

' . " 21 : * 

represents that part of 0^ ^^pj^ uniquely attributable to the component. (This, 

of courser is not true for mean squares. J Furthermore, since X . is the observed 

Hz. 

score for a sin^^le person and a single 'item, the variance components in Equation 9 
are for a single person, a single item, and a single person-item combination, 
respectively. It is both u5uhl and highly advisable to report G study variance 
components for single observations based on sampling one condition of each 
facet. These G study variance components can be used easily in subsequent * 
D studies that involve sampling any number ot conditions of each fdcet. 

There are several procedures that might be used to. estimate variance compo- • 
nent^ For exam^TeY Gornf-ield and .Tulcey^ (lS5.6)jr Cronbach et. al". - (19;;?) , Millman 
'and. Glass (1967) ,\and mos^t experimental design texts J[^.g*, Kix?k, 1968) discuss 
procedure's f or -oSta^ning the expected value <of mean squares in -terns of variance 
components. ^ The re^l^iftg^ set of ^equations can be solved to express estimated 
variance components '1?n> ^r)r.s of mean squ^s (see Endler, 1966). Also, using 
these procedures, expected 'tmean squares and estimated variance compbnents can be 
/obtained for models iJther th-an the random, effects model. These procedures; 
however, are often more genej:^ and more complicated than the, requirements of 
a generalizability analysis demand. For ^e?cample, usually a G study* does not 
directly require. expected mean^ squares. Furthermore, it is usually best to 
perfoim a stuiy under the assumptions of ti random effects models 

The terms "random,'* "fixedV' and .^'mixed effects" are common in the 
context of analysis of variance, but they have been used less frequently in 
the context of generalizability theory. -la the usual terminology of ge^ieraliz- 
jability theory, a facet is random if conditions of the facet are randomly 
sampledN^rom an infinite (or essentially infinite) universe of possible conditions 
for the facet. . ' ' ^ \ 



^ Generali2ability 
• '^T" T'"^' " • \. ' . ' ■ 22 . 

_ No'tationally, if.H is the sample size for some facet, arid N is the 
size of the universe for the facet, then the facet is random if n < N 
A facet is fixed if n = N • "If all facets are random, then the "design is a 
random ^effects design. Similarly, if ^ all facets are fixed^ then the design. 

is a fixed effects design, Tf some facets are fixed and some random, then 

w * ' ' 

the design is a mixed effects design. For a .G study it is almost always best 

to '.estimate variance components under th^ assumptions of a random ef f ects~nioaeT^ 



The variance components resulting from a random effects analysis of (5 study 
data can be used easily in subsequent D studies that employ random, fixed, or 
mixed model%-/__The only important exception to this general rule involves 
"random samjjjing from a finite universe,, which is treated later, . 

Algorithm 2^:" Estim'^tion of Variance Components for Random Effects Models , 

" , ^ ^ ~ ■ ■ ■ ■ 'if 

Another procedure foif estimating variance components" entails the use of 
Venn diagrams^ (see Cronbach et al., 1972). This procedure (which is illus- 



design that is relatively uncomplicated. However, the Venn diagram approach 



tj^a^d—later)— is-quitre usecux when ,the random, effects , model is employed in. a 

, tlie 

is rather difficult to use with more complicated designs. The following 
algorithm reflects the Venn diagram approach, but it 'does" not require, tfie use 
of * Veiin diagrams. ' ^' . . ^ y ^ ^ 

Assume that a is some component coniSisting of k' indices, -Here,, it does 
not matter whether an index in^ a is, nested or not, Xn geheral, the estiij\ated 
value of the variance of the component a, for the- random effects mocje'lsls: / 



* / 



f (a) 



some combination 
s^of mean squares 



■1.-, .\ 



i * 



Generaliztibility 

c- . 

.23 



v*ere.f (a) has been 'defined in Equation 4,' and the " appropriate combinatiqn 
of mean squares is: ^ : ^ " r , , ! 

Step 0^: MS fa) ' ' , 

Step 1: Minus the mean squares for all components that consist of 
— i-hPt V indices in a and exactly one additional index (call 



, the set of^ additional, indices A) ; 

Step 2_i Plus the mean squares for all .components that consist of 
the k indices in a and any two of the A indices; 

Step .3 ; Minus' tiie mean squares for all components, that. consist of 
^the K indices in a and any three of the A indices; 



step i: Plus (if i is even) or Minus (if i is odd) the mean squares ^ 
for all joinponents \that consist of 'the k indices in a and 
any i of the A indices; ^ . • , " • - 



Th^ algorithm terminates when ai step results in no mean squares added or;, 
subtracted. * * - 



28 



. General izability 

For some components , no steps are required. For* example ^ the estimated 
variance, df the component that contains all' iiKiices in the design is simply 
the mean square of that component. .Also# except in quite c<xriplicated designs/ 
it is- rare that more thanv two steps are required to obtain the estimated * 
variance component in terms of mean squsures. The actual number o£, steps 

required for any component in any design is 3^ - where £ is the toted, number 

^' ■ , ^ 

of indices in the design^ ' 
^ Tables 1-5 provide equations for estimating the variance of ,the compo- 
nents in each of the five illustrative designs, assuming the random effects 
model. Consider, for exsunple, the component a 5= gi in the design (]^:£) x , 

since all indices in the design are included in a, £(a) = 1 and Step 1 results, 

J • ' 2 . ' 

in ho mean squares subtracted from MS{pi*^)/ therefore, d (pirc) = MS(pi:c) ; 

For the component a = £:£ in the same design, £(a) is simply n^. Step 1 
results in subtracting only MS(pi;c) from MS {2.:]c) , since £i:£ is the only compo- 
nent in the design that contains a (i.e., £:£) and one additional index ( i^) . 
Step 2 results u no mean^ squares added. JCherefore, 

6^(£:£) = ImS(£:£) - MS (£1 :£) ]/n^ \ 

For the component a = £ in the design (£;£) x i^, the' product of the . 
sample sizes for the indices not included in" a is n n.. Step 1 results in 
subtracting both MS (£:c_) and MS (ci) from MS (£^ . Step 2 results in adding 
MS(pi;c)l Step 3 results in no mean squares subtracted. Therefore, 

6^(£) = CmS(c)*- MS(£:£) - i?S(ci) + MS (£!:£) Vnn. * ' 



Insert Figure 2 about here 



, Generalizability 
^ - "5 - . . 25 •• . " " ■ 

Figure 2 usssWenn diagrams to illustrate the estimation bf the variance 
of the* three components discussed above. In the Venn diagram approach, a mean 
square for a main effect is represented by a circle j a meanr square for an inter- 
action 'is represented by the intersection of two ^r more circles; and a' variance 
component is represented by a part of a circle that usually looks -like a 
phase of the moon. More specifically, a part of a circle represents £(a)d^(a). 
The Venn diagram, approach to determining estimates of variance components is ^ 
cjuite useful for relatively simple designs, such as £ x i^ and (£:£)-x i^. 
However, this approach is not possible with some complicetted. designs, ^d 
this approach is difficult to employ with designs that involve considerable 
nesting, such aCS the design (£:£) x ^ 

Algorithm 2 proyi^^les an estimate of the magnitude of a variance component — 
not ,itg statistical s'ignificafnce. ^veri if-- cu variance c<toponent is not 
statistically significant, it is an unbiased estimate, and.it is better, to 

us e- it ,- than ~4xL^-e pl ac e it w i th z er o (Cronb a ch f/t^ al ■ , 1 . 972) , , Nftv6rthfilesg ^ !_ 

estimated variance comrjDnents, like other statistics, are subject to sampling 




variation.^ JJwHBstopd^ is\utside the iiitended- scope of this paper, but 

pertinent issues *ve^ treated by Cronbach et a-1. (1972, pp. 49-56), by ' 

^Searle, (1971), and to .some extent by Scheffe'U959); ^d. Winer (1971)t* If, 
« . • ' "'^ - 

however, Algorithm 2 results in a negative estimate of a variance component, 

then the use 'of either Algorithm 2 or the Venn diagram approach is questionable. 

Procedure^ for treating negative estimates are discussed by ^Crqr^ach et al*, 

(197-2, pp. 57). One such^ procedure involves use of expected mean squares. 

Expected Mean Squares , . 

Although a G*^ study usually does not require expected mean squares, it 
is easy to obtain them for the random effects model using the notation introduced 



f ' ' . . ^ Generalizibility 

• 26 

n this paper • In general, for the random effects model; the expected mean ^ 

■ 1, ■ • • < ■ : ■ ■• 

square associated with the component 6 is: 



* EMS(g) = E f(a)a2(a) ; ^« ■ (10) 



a 



wher^ a is any component that contains all of the indices in S, f^(a) is defined 
by Equation 4, and 6"^ (a) is the random 'efffects variance component for a, . * 

Consider, for example, the component £ in the design £ x "(i^is^) * From 
Figure 1 and Table 3, it is clear that the components that contain the index 
2_ are £, ps^ and £i_:s^. Applying Equation 4 to these components , given fi^) ~ 'Vi.^c 

f (ps) = n. , and f.(£i:£) =1*- Therefore, ^ \ 



EMS (p) =a2(pi:£) + rua^ (^s^) + n^n^a^ (p) • . ^- (11a) * 

similarly, 

EMS(s) = a^(ni:s) + *n.a^(ps) + n a^(i:s)'+ n n.a^(s) * , ' - (lib) 
-1 — -£ ^~ "ETi^ ~ - . ' 

" EMS (i;s) = (pi:s) + n" 0^ (_i :£) ; . \ . ' (11c) 

EMS(ps) = a2(pi:s).+ n.a2(ps), ; ' ' * , . (lid) 

— — - — — "^1 

EMS(£i:£) = a^(£i:£) . - ^ ^ " . (lie) 



Perhaps the mo»t important u^e of expected mean squares ,in a G study is to 

estimate variance components when Algorithm 2 or the Venn diagram approach 

results in one or more negative e'stimates ^for variance components. Consider,, 

for example, the expected mean squares provided by Equations 11a - lie for the 
* • 

£x (i.5£) design. Equation lie can be used to estimate <y^(£i::s^); and then 
Equation lid can be used to estimate (ps) . If the estimate of (£§) is' . 



31 



I ' J}l ' ^' ' * . • - . ... .>^. 

I ^^""^ * ' • • ' . Generalizibility' • J 

negative, then zero is substituted for the negative estimate, and this zero 
is carried forward as the estimate of a^(£s) in all^ other expected mean square 
equations.* This "plausible 'Solution" to the negative estimate problem is 
suggested by Cronbach et al^-'^(l972, pp. 57). 



JO 



32 



ERIC 



Generalizability 
- 28 



*t4 



D Studies for .Random, Fixed, and Mixed ,Mo(iels * " 



.The primary result of a typical G study is the estimated random effects 



variance components for the G study design. These G stu3y variance components 

are for single observations based on random scunpling of one condition of each 

. 

facet , from an infinite universe of admissible ^conditions (or observations) for 
the fac^t. By comparison, a decision maker will. want to use these results in 
some D study that involves its own^ample size, jiV , and universe size N\ , for 
each facet in the universe of generalization . If, for exeimple, N' ^ then 
the facet involves sampling from an infinite universe of generalization; and if^ 
n* = N • , then the facet is fixed in the universe of generalization. Here and 
in Gronbach et al. (1972) £ refers to the size of^the sample and N to the .size of 
the universe of admissible observations from the G study. Similarly, refers, 
to the^ sample si^e and N' .to the size of the universe of g^eralization defined 
by some D study. • , ^ % . " 

In performing a D study, then, the decision maker must specify, directly^ 
or indirectly, the sample sizes and universe sizes ^f or each of the facets in . . 
the universe of generalization. In addition, the decision maker must specify 

- the object of measurement. It is usually the case that the facet for persons, 
or some aggregate of persons (such as a class) , serves as the object of measurement 
in a D study. -Howevei^ any facet could serve. as the object of measuremerit 
(see Cardinet, Tourneur /, &' Allal, jL976) . Suppose, for example, that the design 
(£fc) X i^ 5^7ere used in the G study. A D study might use persons, itemsr, or class 

.means as the objects of measurement. . Jn some literature the terms '"object of 
measurement" and "unit of analysis" are used synonymously. ^ ' j^. 



•33 



Gem; ^alizability 

* » ■ , ' ' » 

However, recently the. unit of analysis iss.ue has been viewed primarily ■ 

in the context of choosing an appropriate unit of analysis (see Cronbach, 

peken,"" and Webb, 19767^Haney/ 1974b). This, of course, is an important issue, 

but it, is outside the scooe of this paper. Our concern here is with issues 

in analyzing D s.tudy data once the object of measurement has been chosen. 

In order to avoid ambiguity, therefor^, we' use the term "object o£ measurement". 

rather than "unit. of analysis.," .^-^t^ 

£ study Variance .Components 

Syppose a G study is conducted usin§ the .design £ X i x o • Table- 2 ^ 
provides the estimated random effects variance components resulting from such 
a G study. A typical D study, associated with such a .G study, might use £ 
as the object of measurement. ""Por such,^ D study, the observed score for person 
o, assuming an. infinite universe of generalization for the itei\i and occasion 
facets, can be represented as-: ^ . 

P £I0 . £ ^^I 0 ^£0 *^I0 ^£I0 ^ I 

where experimental error is completely confounded with- y ^ . In Equation 12 

* ✓ * " dIO 



an upper-case subscript ir^dicates the mean for a D st^u^y sample of size n'; i.e. 



n ■ . y r — . ^ 



I n 1 




n: n' ■ 
-1 



C C — l-O 1=1 o = 1 ^ — 



where X '=u^u'^+U'.'^ + u'^+pA + ii A^ + ii.i'+u/.'v , 
pio . E -V- 1 2. " E^. 22. i2. pio 



34 



■ ' - . . ■ • Generalizability 

Ndte-^that here we use the Abbrevtiation X to mean^ X , where £ is'the object ^ 
. - • R ^ iiiX 

%. * ' • » ^ 

of measurement for .tlUi^D study* " ^ ' - 

. For each of the -score effects in Equation 12, the estimated D* study 

variance component is obtaine.?^ by dividing the estimated G study variance com- ^ 

ponent by the " frequency of sampling the effect within the object of measurement. 

In general, the frequency of sampling the component a within the object of 

measurement component X. is: ^ . ^ . 

/•I if a contains only indices* in y; and, otherwise 

i(alY) =< ' " ^^^^ ' 

I the product of the D study sample sizes for a^l 



•indices in a that are not in y • ' 

For example, for the component £ in uhe p study design represented by tne>* 

structural model in Equation 12^ dCajy) ^(eIe) = 1? ^he estimated D study 

variance of the component p is d2(£)/l = d2(£)^ For the component' 

d(a|Y) = n* , and- the 'estimated study variance ' ' ^ 

* 

component for I is d^l) = d2(i)/n^. For .the- component £l, dCajy) =^§(£i|e) = S 
•and the estimated p study variance component for 2I is (pi) = ""^(sL^/nl* 

All D study variance components for the design ^£ x £ x o are reported in - 
TaBle 8^. It is important to note that these variance 'componeats are for a^ * 
random effects D itudy; i.e., n^ < tJ^ ^ « and n^ < IT , It is also possible 

to .express D study variance components^ in terms of a model different from the 

random effects model. (See subsequent 'discussion of sampling from a finite 

* * ** ' 

unjLverse^) However, even w'len one or more . facets is fixed in. the universe^ 
^ of generalization, it is usually more informative to use and report the random 
effects D study variance components.. Various combina^-.ions of these components 
provide the summary statistics typically used ih, a D study. The only important 



General izability 



31 



exception occurs in the case pf sampling from a finite universe ♦of generalizatipn; 
thi^'.possibility is considered later. 



Insert Table 8 abo\xt here 



By convention, here and in Cronbach et al. (1972), D s€udy estimated variance 
components and summary statistics formed from them, are expressed in terms. of 
mean scores. For. example, d^(I) = d^(i)/nl in Table 8 is the U study estimated 
variance component associated with the mean score for a sample of n^ items • It. 

is also possible to .express, D study variance components m terms of , total scores. 

>* • " " . 

•For ex3mple, the D study estimated variance component associated v»ith the total 
* ^ . ' * ' * ' * ^ """I 

score for a saitfple'of nl items is nlO^(i)* In general, *for the total' score metric 
D study components are obtained by multiplying (rather than dividing) G study/ 
variance components by the sampling frequency within the object of measurement 
(see- Equation 13). , , , o 

2. study Summary Statistics , ^ - 

D study variance components are useful in and of themselves, because they 
provide a direct indication of rhe i^elative magnitude in the D study, of each 
of the independently estimedDle componen4:s of score variance. Irf addition, D study 
variance cqmponents are frequently used to\estimate one or more' oi^ tho following: 

\ - ; ■ . ■ 

a^d) : the universe score variance tor the object of medsurement t. 



wliiph is analogous to the true ?aore variance in classical- 
test theory;** ' 



\ 



la2.(X): 



General izeibility ' 

■ 32 . 

^ ♦ 

the expected observed score variance , which'^^is the expected 
value over design replications of observed deviation scores; 

an intraclass correlation coefficient, called a coefficient 
of gen^falizability, which is analogous- to a reliability * ' 
coefficient in classical test theory; • " , " ' . 



§ a^(,^)z the error variance for, making ccanparative decisions among\ 

the objects of measurement (e,g.> persons), which is analbgou 
to the error of. measurement in'classical test-theory. ' "The 
error 6 is the discrepancy between the observed deviation 
score and the, universe score expressed in deviation foriti" 
(Cronbach et alw 1972, p, 25) 



of (A) 



a2(G) 



the average erroj? variance witbiTi an object of measurement 
(e,g, person), where error is defined as the difference be- 
Lween observed and universe" s^ore; arid 

the variance of errors of estimate 'from the linear regression 
of universe scores on observed scores; that is, is the 

variance of the discrepancies between estimated and actual 
universe scores. 



The following equations provide ? .tie useful relationships among estimates 
of the statistics introduced above: ' • » 



(14) 



3? 



^' GeneralizaUbility , 

C 33 



d^e) = d^T)(l - |p^) ' (16) 



Equation 14 is analogous to the* classical test theory result that the 
variance of observed scores equals She variance of true scores plvA the variance 
of error scores. Note, in particulap, that the error variance in Equation 14 is 

lfi)5! not d (A). The latter has no clear analogue in classical test- theory 
with its emphasis on. parallel measurements (see Lord, 1962); however, Brennan and 

Kane (in press-b) show that (A) is related to a type of error variance discussed 
by. "Lord (1957) prior to the advent of generalizability theory^ Also Brennan " 
and K^e'(in press-a, in pr^s-b) ^show that o^(Ly is usually an appropriate 
estimate of error variance for domain-referenced mastery tests, whereas 



.is seldodi appropriate. 



As implied by Equation 15, a coefficient of generalizabiiity is defined 
as the ratio of universe^^core variance to expected observed score variance* 
In terms of estimates, ^ is a consistent . esUmator of (T)/^a2 (X) because 
d (t) and Pa (X) are both unbiased estimates (see Lord and Novick, 1966, . 
pp. 201-203). Also, the notation $ is indicative of tfhe fact that a 
generalizability coefficient can be interpreted as a squared" correlatioa or - 
intraclass correlation coefficient (Pronbach, Ikeda, & Avner, 1964) , ^as well 
as an approximation to the expected value of" the correlation between paixs of_ 
measurements '(Cronbach, et al., 1972, Chapter 8).^ , - ' 

2 / 
.In Equation 16, 6 (e) is ^trictly appropjriate only if the^egression 

equation for .universe scores on observed scores is determi^d from the actual 

conditions used in the D ^tudy. Otherwise, d^(e) in Emi^tion 16 is an under- 

estimate of a (c) (see Cronbach ^t al., 1972, pp. 78-764). 



- . *■ Generalizability 

. " ' 34 " . ■ 

Combining D Study* Variance Component^g 

' In order to determine which variance componenito enter each of the sximmary 

. ' • ^ 

* * 

statistics defined above, it is necessary that the D study be clearly specified. 
Here, the nature of a particular D study employing a specific design will be 
identified in the following manner: D(y|v|f|r), where . ' 

OB . , 

Y = object of measurement componejit (i;e,f the facet that serves 
^ as the object of measurement for the D study); 



V = main effect index in y ? 

F = the set of facets. that are fixed in the universe of generalization 

\ ■ • 

(i.e., facets for which n'*= N')\and ^ 



R = the set of random facets (i.e,, facets for^ which the D study 



_ contains a random saruple of n' conditic^ from the universe 
\ <M\gei^S):3iizatipn for the facet).* 

■ Here, fc^^^ir^dom facets,^ it is assumed that the universe of generalisation is 
inj^^ly^d (ije,^ n^* < :^^.^) } latex, we consider random sampling of conditions 



fipi^-' finite uniVerse of liberalization. 



, \n tW^ notation D{y|v|f|r)V^ and R specify the universe of generalization, 
and ev^y index in the D study design is in V, F, or R.^ There are, however, two 

/ K * ' . . ' 

restrictHms on ©(y|v|1f|r). First, each-index* ip'Y' must be in either V or £ 
but not in botfhs. For example, if £:£ is the object of measurement component y, . 
■ then £ 'might be in- ^and £ in F, but c could not be in both V and F. Second, 
there must be at least one iridex in R in order tc make the D study informative? 

\ ■■ ~ . • ■ ' ' 

otherwise. the D study vraul^ not involve generalization over any facet. 

\ 



' ^ ^ X • ' * Generalizability 

^ ' ' 35- 

\ 



An Algebraic Procedure > Given DCylv^g^l R) # the components that enter 
o (t) (K) r and a (A) are,* respectively 

\ 



.ERIC 



\ 

\ 



t(y|v|f|r) = - 0 0X . • V . ^^7, 

/i. - • , R ' ■ V R ' 
X(y|v|f|r) =X $X \ V . . (18) 



' 6(i|v|f|r) = (X - ^X ) - (€X -it^V ) ; 

' V ' R ' V R • 



(19) 



and A(y|v|f|r) = X - ^X^ ;\ «• • (20) ' 



where each expectation is taken over the population or universe* * 
In Equations 17-20, is the universe scoX^e for the D study, and t(y|v|f|r) 

is, the universe deviation score. Similarly, is the observed score for the 
D .study and X(y|v|f|r) is the observed deviation score. * ^ • , 

Consider, for example, the obrsierved score for the design i. x o in 
Equation 12, and suppose that the D study is D{y|v|f|r) = £(eIeI ' where 

the symbol ' is used to indicate that there are no fixed facets in the 

universe of' general j.zat ion. 

' 2 
'From Equation 17, the components that enter. o^(r] are: 



\1 A* 

E 



. 40 



Generalizability 
'36 



and, therefore/ 



V 
4 



*.wi^-^^ 18/ the components that enter iSa"(X) are: 



XCElEl-li^oy^Xp - 

p. 




E El - £0 £10 



(21) 



and, therefore, 



$o^(K) = a^(p) + a^(pl^) + a^(£0) +»a:^(plO) 

"Note""that 'pO^fX)" is different from, the total variance, a (X), which is the 
sum of all D study variance, components (see Equation 12) . 



(22)' 



From Equation 19 , the cbmponents that *enter 



are,: 



6(p|pMi,0) = (X^ - I X ) - - ( ^ f X --I ^X ) :\ 



£ _-•£ 



= MX - y * ) - (u - y) 
.10 E 



■S 




El £0 , Ei£ - 



and/ therefore, 



^a^(6) = a^(El) + a^(pO) + a^(pio) . 



(23) 



From the above results it is clear that jSa (X) .equaj^^he sum of a (t) and 

>o 2 <^ ^ 

^; jSa (6), as indicated in Equation 14. - 

**r ^ 2 

Finally, from; Equation 20, the components: that enter a (A) are: 



Generalizability 

37 . 



t T O *^ 



and, therefore, ' 



= X - p 



a^(A) = a^(i). + 0^(0) + a^C£i) + a^(vO).+ a^(io) + a^(£io) 



With the exception of, |a^(X), estimates^f the above results Sfe 
■reflected in the fourth column of Table 8 , (X) is most easily obtained 

using Equation 14; and, of course. Equations 15 aftd 16 can be used to obtain ^ 
]o and (e) . ■ , . ' . 

A Notational Procedure . The procedure Represented by Equations 17 to 20 
is a -straightforward application of generalizability theory, but it does 
involve some degree of ,algebi;aic complexity. A simpler procedure involves a 
direct application of the notation for variance components used in this paper 
If the D study is D(y|v|f|r), then: 

(a) variance components that enter (X) are all variance components 

that Contain the index in V ; 

•(b) variance components that enter- a^(T) are all variance components 



that contain the index in V and do not contain 
any index in R; 



tc) variance components that enter la^6) are all variance components 
that contain the index in V and ohe or more of the indices 
in R; and j r» * 

4/5 



Genera 1 iz abi 1 ity 
38 



(d) ,> variance components that enter a (AX are all variance components 
' that contain one or more of 'the indices in R. , 

For example, for ^he model Equation 12 and D(Y|v|r|R) = 2_(p|p| -ll^O) / 
}c'a (X) consists of the variance components that contain the"^ index^£ m V, 
Th'ese components are a^'c^) , a^(pl)', a^(20)f and (pIO) ; therefore,' ^'a^ (X) 
is the result provided by EquSttioh 22., The»varia|ic:iB components that enter 
a (t) are those, which contain a^,and do ndt contain an or an O (the indices 
*in R) . Tl)e only component that satisfies these two :conditiojis is a (p) ; 



2, J. 2_^ ^. 

components\hat enter $ o^'(S) are thpse which contain a p,^ and one or bot)i of 

* » *2"2 '2 

the indices and O.* These components are <T (pi) , *a (pO) , and a (pIO ) ; there--.^ 
♦ ^ 

fGre,,fg^(6) is the *resul.t provided by Equation 23, Similarly, all variance . 
components exdept a (£) contain either an or an O, or both; therefore, a (A) 
is. the* result provided by .Equation 24. 

* 

Illustrative D Studies 
■ " In this section, the procedures for combining variance components are 

'^discussed with referenqe, to various D studies that might be used with each of 
the five illustrative designs. The results of applyil^g either procedure are 
presented ^in tables similar to Table 8/ and certain interesting and/or v 
illustrative aspects of these results are discussed in the text. (In studying 
these examples it is useful to refer to the' model equations in Tables 1-^5 

* f or the five illustrative designs.) 



therefore, cr (x) equals a (£) , as specified by Equation 21. The variance 

:h 'contain a p ,^ and on 
2, ' 2_, -2 



Insert Table 9 about here 



I 43 



Generalizability , 
39 



The Design p x i > Table 9 presents a single application of the procedures 
for combining variance components #in the £ x i, design. For D(D|£|-|i) , 



= u - u 
E 



E 



= li I"* ; and 
El 



"L El 



Therefore, 



1 0^(6) = o^(£l) = o^(gi)/iu 



o^(A) = 0^(1) + o^(gl) 



o^(i) o^(Ei) 
+ 

n! -n: 



Gcneralizability 
40 



1 « * = a (£) + ? 

nl , " 



A ^''E' ■ ' ■ ■ ■ (25) 



Equation 25 is algebraicaaiy identical to Cronbach's (1951) Coefficient a, 
and, .when items are scored dichotmously , Equation 25 is ..identical to Kuder 
and Richardson's (1937) Formula 20. However, the derivation of \^ p 
in Equation 25 does not require the assumption of classically parallel tests 
with equal meafts, equal variances, and equal" intercorrelatjLdns. Rat:her, the . 
derivation of requires the weaker assumption of randomly parallel tests > 

Two tests are randomly parallel if they both consist of a random' sample of the 
same number of items from the" same universe. Also, Equation 25 illustrates 
the regulaijity that forms the basis for the Spearman-Brown Formula for changes 
in test length. Increasing the numjier of items, nl , by a specified factor 
leaves (x)* unchanged and decreases |a^(6) by the inyerse of the f actor. 
This type of regularity occurs because the"^ 'universe of generalization contains 
only one facet— namely, the item facet. For more complicated universes of 
generalization, the Spearman-Brown Formula does not usually apply. 

The Desijn p x i x o . Table 9 treats D studies for three different 
universes of generalization when the person £ is the object of measurement. > 
The first D study, D(p|p| - |l,0), has been dis.cussed in detail. The other- 
two involve a single fixed facet, or O.^ 



45 



I 



For exaonple, for 'Q(eIeI ifo' f 



Generaloizability 



41 



4i 



0- -cn , 



5(p|p|i|o) 



y + y ; and 



- I X ) - ( I X - 

■ E . I E . o E ~ 



u ^ + u ^ 
po pio 



That is, (T) consists of components that contain £ (the index in V) and do 
not contain O (the index in R)v whereas ^a^(6) consists of components that 
contain £ and O. Simileurly, for D;(£|£|o|l.) / 



'^(ElEloli) - + y^r^ ; and • • . 
E £li 

That is/ (T) consists of components that contain £ (the ind«x in V) and do 
not contain (tlie index in R) , whereas ^ 0^ (6) consists of components that 
contain £ and I^. 

-In both studies, ^ 0^\K) is identical to ^a^'(K) for D(£|p|-|i,0) . "This 
is a particular instance of a general rule; .namely/ once V is specified 

(X) is unaffected by changes in the universe of generalization. However, 



the universe of generalization does affect . a • C*^) and 



^a^(6) . 



Using 



46 



, Generalizability 

R^r\r\^\L^9) ^ basis for comparison ^ in D(£|p|l^|o) the variance component 

(pl) moves from ^a^(6) to (T) ; and in D(£|£|o|]^), (pO) moves from 
§> o^{6) to a^(T). ' / ^ 

^» 

- .Going one step further, if both I and 0 were fi^ed in the universe of 
generalization, then $0^(K) wo\ild be identical to (T) , E>d^(6) would be unestim 
able, and, therefore, ^ would be unity. Such a result occurs whenever 
there* are no facets over which the decision maker generalize^. ^ It is for 



this reason that R should contain at least one index for the b study to be 

■ ■ y . . . - * 

informative. 

As indicated in Table 8, (A) never includes the variance componentd^n 
O^iTy, and (A) always includes the variance components in ^ a* (6) . The 
remaining variance components enter (A) only if they contain an index in R. 
For example, in D(gi|£| I^|o) , 0^(1) does not enter (A) because this variance 
component does not contain 0. From another point of view, CT^ (I) do^s not 

enter CT^.(A) because is fixed in the universe of generalization, and, 

> * ■ 

therefore,* is a constant for all persons. 

^ \ 

Insert Table 10 about here 



The Design p x (i;s) . Table 10 presents illustrative D studies for a . 
dfesign that involves a single level of nesting in the universe of generalization. 
For the second D study, D(y|v|f|r) ^ £(pIeI£|i1)/ with S_ fixed in the universe 
of generalization, , - - . 



I ^ D,I ^ 



47 



■v.- . 

( V 



43 1' . ,T 



= (X_ - - ^^r.^ - 



u ; and 



.A(2.1e1s1i) = X - |X 



I 



£ . 



= X - y ^ 
E Ei 



= u ' + U ^ J^' 



In. terms of the notational procedure for combining variance components, a (t) 
consists of variance components that contain £ (the index in V) afid do 
contain (the index in R) ; i.e.. 

Similarly, la^(6) consists \of variance components that contain K ard ][; i.e., 
• f 0^(6) = a^(£l:S) 




5ind a^(A) consists of variance components that contain I^: i.e.. 



a^(A) a^(I:S) + a^(pI:S) 



If, then, S is fixed in the universe of generalization, 



? 2 



d^(£) +^(£8) 



d^(p) + d^(ES) + d^(£l:S) 



48 



Generalizability 
44 . 



whereas, if S is a sample from an infinite universe, 



J 2 " 

P = 



* ^ (£) + (gS) + ^ (£1:5) 

[see D(£l£]-|l,S) in Table lO]. • ' ' ■ ' •■ 

The charactreristics and utility of generalizability. coefficients that 
takfe stratification of content'into account, were studied by Cfeonbach aria~his 
colleagues (Rajaratnam, Cronbach,' S Gleser, 1965; Cronbach, Schonemann, & 
McKie, 1965) shortly after their semintH- Work on generalizability theory ; 
(Cronbach, Rajaratnam, S Gleser, 1963). They concluded that if the items in 
a. test can be divided into different content strata, then estimates 6f " " 
reliability should take the stratification into account; otherwise, reliability 
may be seriously underestimated. 



Insert Table 11 about here 



D Studies with Nesting in the Object of Measurement Component. Consider 
'the design (e:c) x i and the D study D'(e:c|£| c| I)^ in Table 11. For this D study,, 
the pbject of measurement component, y , is an<5 each .person is nested within ^^^^^ 
a particular class. Since both a2(£,:c) and a2 (c) contain only indices in . 
y = p:c, the D study sampling frequency for each of these G study variance com- 
ponents is unity (see Equation 13) . For this D study, the universe of generalization 
I' - 

co'ntains a single fixed class and an infinite universe of items, from which a 
sample of n^ items are drawn. Consequently, a2(c) does not enter 
o2(T), a2(6), a2(A), or |a2(X). for example, • » 

49 



Generalization 

45 - 4' 



X(E:ci£ic|l) = X . |X 
■ E £ E 

'• . - -= X - u ^ 
'• .£ cl 



£:£ pl:c 



and ^o^(X).= o^(p:c) + a'(pl:c). ; 



/ 
/ 



i.e.,* |a^(X) consists of variance components, -that contain £ (the index in V). 
It is particularly important to note that this D study is not identical to 
tlie D study for the P x design in Table 9 (see Brennan, 1975), 



Insert Table 12 about here 



*able 12 provides illustrative D studies using p:£ as the object of 
measurement component in the design (£:c) x (i:s^:t),. . Although" these D studies 
use a considerably more compl-Cted design, it is relatively easy to apply 
the notational procedure for combining variance components* * . 



Insert Table 13 about here 



D Studies with Class as^ the Object o£ Measurement . Table 13 provides 
illustrative D studies for the design (£:c) x i when the object of measurement 
is the class, c, or more specifically '-he class mean: 



c PI:c R}Si c I Ci ^_A*H. 



where experimental error e is completely confounded with y^^^ 



c 



5l0 



Generalizability 
'46 



For the D study in Table 13 that involves generalization over both samples 
persons and samples o5 items^ ^ 



— c 

f 

= X - y 



cl 



P.r:c 



and |o^(X) = 0^ {c) + (P : c) + (cl) + a^(PI:c) ; 

i\e. , (X) consists of components that contain c (the index in V) . As noted 

pre^.ously,/^o^(X) is unchanged by changes in the universe of generalization, 
but tKis is not true fpr oV)", lo^o), or o^A) . in particular , Table 13 

shows thkt when generalization is over both persons and items, 

\ . ■ ■ ■ 

a\r) = o^(c) ; • ' • 



when generaliza^ori is over items, only, 



7 2 2 

o (T) = cr. (c) + o (P:c) ; ' 



arid when generalization is over persons, only, 

{-z^CKo^ {c) + a'^(cl) . - • 

■ " \\ , - • 

The estimate of lach\f these three different universe score variances Lox, 

equivalently, the\ three\f f fereht e^jtimates of lo^CS)] provides a different 

estimate of the gdneraliz ability of class means. That is, these estimates 

differ with respect to^the facet (s) over which the decision maker generalizes- 



51' 



Generalizabildty * 

■,^47. ^ 

The estimation of reliability/ or generali2iabi^^3i,^r when the object of 
measurement is some aggregate of persons has been a particularly troublesome 
^ problem* ip recent years (see Haney, .1974ar 1974b)* In tems of published ^. 
literature. Medley and Mitzel (1963) and Pilliner and his colleagues (Maxwell 
& Pilliner) 1968; Pilliner, 196S; and Pilliner,, Sutherland & Taylor, 1960) 
appear to be among the earliest researchers to recognize that ,the class mean 

' « T • 

is frequently the variable of interest, rather than the score for a person. 

More recently", large, scale evaJLuations, such as those undertaken for Head 

start (Smith & Bissell, 1970), Fc^Jow Through .(Abt Associates, 1974; Haney, 1974b), 

and the National Day Care Study (Stallings', Wilcox, & TraverS, i976) , have ^ 

frequently required estimates of reliability when class mean is the object of 

measurement* Similar issues arise; in the study of course evaluation questibn- 



naires (Gillmore, Kane & Naccarato^ Note 1; Kane, Gillmore, ^ Crooks, 1976) 



and studies of school effectiveness and accountability (Dyer, Linn, & Pettton, 

i 

1969; Marco, 1974; Page, 1975). ; 

The literature ^does contain some approaches to the estimation of relieODility 
for class means usin^ classical test theory. For example, Shaycoft (1962), 
Wiley (1970) , and Thrash and Porter (Note 1) developed three different coefficients, 
each of which assume that an observed score is the siam of a true score and an 
undifferentiated error term. However, eadh of these proceu. ^ makes different 
specific assumptions ^bout what constitutes an appropriate estimate of error 
variance. As a result, each procedure gives a different estimate of the relia- 
bility of class means. l\ane and Brennan (1977) show that Wiley's coefficient 
is equivalent to ^p^ when items are fixed. Thrash and Porter's coefficient 
is equivalent to ^ when persons within -classes are fixed, and Shaycoft 's 
coefficient is an overestimate of when persons within classes are fixed. 



ERIC 



GereraliZctbility \ 

'■ ^ 

It is not.,sjaBris inq that none of these coefficients corresponds to Cp^ when 
generalization is over both persons and items* Classical test theory does npt 
specifically allow for differentiating among sources of error in a multi-faceted 
universe of general^iCc-tion* 



Insert Table 14 about here 



Table 14 provides illustrative D studies using the class meart as the object 
of measurement in the design (£:£) x {i.2S_2 1^) • The reader can easily verify the 
results in Table 14 using the notational procedure ' for combining variance' 
components* D studies for this design, are clearly more complicated than those 
for the (2.:*£) x i^ design; "however, in- large scale testing efforts involving 
analysis of class means it is frequently the case that data are collected ^ 
according to rather. complicated sampling plans. To overlook this complexity * 
is to discard some amount of information in the data, and, therefore, to 
potentially restrict the utility of the results. ^ ^ - 



ERIC 



r.3 



Generalizability^. ^ *-i«%r 
- ^ ' r 49 " . * 

* ^ ^ Sampling' ^ from Finit'e Universes ' ^ , 

To this pointy our* discussion of genieralizability theory Has. focused on 
• • • < 

D studies in which each of the facets in the universe of generalization is either ^ 

fixed {i.e., n' = N' < <») or essentially infinite (i.e.^^n^ < -> »). We" haye 
; • . » 

seen thatrsuch D studies can be carried out using G study random effects variance 

compona!atsV* or^ more specifically/ variance components for single observations 

based on random sampling of one condition of each facet from an infinite universe 

Of^admissible conditions^ or observations, for the facet (fl, «>. It is also 

possil^le^ to ^^develop equations for cralculating G study variance components for 

rahdom sampling of one condition of each facet from a finite universe of admissible 

conditions for the facet (N < ») . These U study variance components are especially 

usefcl^ in D studies characterized by sampling ^f rom a finite universe of generaliz- 

atdonO *' . . . • 

Unfortunately, any' verbal discussion of different sampling procedures .in . ^ 
' ** * • . 

typical ANOVA teifms is apt to involve considerable ambiguity. The proble;^ '.is^ 

^orLnariiy evident in the term "random effect/* which, in traditional ANOVA 

terms, actually implies "random sampling from an infinit e universe," as opposed 

to no sampling ^at all '(i.e., "fixed- effect") , or randcxn sampling from a finite 

universe. ^ It is particularly important to note that the traditional ANOVA 

t notion of'"randoi|\ effect" does not mean sampling from a finite universe, even 

though such sampling is "random." * For* this reason we will restrict our use of 
. * * * ' 

the term "random effect" to random sampling froirt an infinite universe, 

G Study Considerations 

In'thds section we develop equations for G study estimated variance 
components and expected mean squares for any model M. That is, these equations 

' ' 54 



Generalizability 
50 



are applicable to n < N. <■.» (sampling from a finite universe of admissible obser- 
vations) , n,.=*K.< » .(fixed" effect)", Ind ^n < K ^ » (random^ effect).. • 

Estmation of 'var^ « 62(a|M) is the estimated variaSice 

for the cciaponent a, given a G study design using the model M, then ^ 



1 ■ ■ ■ . . i 



V(6^ 

,F(Bj) .J 



(27) 



Where 6^ {a) and (p.) • are ^estimated G .study variance components for the random 



effects model calculated from. Algorithm 2 j 



6. .= any component, except o, that contains all the indices in 



3 

a and 



F(6.) = the product of the G study universe sizes (N's)' for all-. indices 



in 6. except those indices in a. 



AS in Algorithm 2. no distinction is made, between- nested and non-nested indices. - 

If N ->■' " ¥«^all facets, then d2(6.)/F(S J is always zero, and d2(«|M) = ^^(a) 
If, on the othe^and, all effects are fixed, then, all universe sizes (N's) equal 
the sample sizes (n's) in the G study. In the case of mixed models, some effects 
. are random and some fiaed. For other models that involve sampling- from a finite 
univer^or one or more facets, the actual universe size is used in Equation 28. 

For example, in the design, (£=£) x i, consider the component p:c The 
only other com.ponent that contains the indices p and c is Ei:c; therefore, ^ 



55 



Generalizability 
5X * 51 



d^(p:c|M) = 8^(£:£) + 



Nbw, if i is a fixed effect; in the G study, then N". = n. and 
. ■ - ~±. ~±, 



d^(£:c|M) ~ d^(2}-6) + ~ 



n. 



If, on the other heind, i_ is a feindom effect, then the tmiverse size is , • 

considered infinite and 6^ (E^eIm) = • If n^ is a sample frcmi a finite 

universe of size N,, then the actual value of N, is used. in the above equation. 

~L ■ ~i > ■ , ■ . 

Also, in the design (£:£) x i^ consider the coroponent i^. The-'compbnents 
.that contain i_ are ci and £i:^? therefore, • , 



a^(i|M) = &^(i) + 



(ci) (Jd :£) 



N 



if, for example, £ is a randpyn effect and £ is a fixed effect in the G ^study, then 



f ' \ &^(cij 
&^(i|M) = a^(i) 



n 



Expected Mean Squares . ■ For any model M, the expected mean square for the 
component 3 i^s: • 



ems(BIm) = Eh (a) f (a) 8^ (a) ; 
a f 



(29), 



•56 



/ 



! ' General izability 

where a is any component that contains all thf indices in $; f (V) is defined 
by Eqtiation 4; h(a) is tho croduct of the terms (1 - nM for all main effect 
indices in a that are not in ,3; and 6^ (a) is the estimated random effects G study, 
variance component for a calculated from Algorithm 2, - ^ 

•For the component P in th~e design__p^x (i:£K^ * 

ems(p|m)"= ^1- 7 n7^"^«^ (pi=£) + |i - ^r^2i^^^£^^« 

If both items and subtests ar'e random effects, then both (1 - n^^/N^) and 

\X ^ n /N ) are unity and KMS(£|m) equals EMS(p) for the random effects model, 
— s "~s 

If items are random and subtests are fixed, then (1 - ILj^/lii) unity, 
• (1 - n /N ) is zero, and, 

' - -A 

EMS(£Im)= d'' (Di:£) .+ n.n (£) . / 



If items are random, and the subtests. in the G study are a sample of size n^ 
from a finite universe of size N^, then.- 

ems(pIm) d2(pi.s) + 1^ JB-i^'^^E^^ + iliBs^*"' • 

D study Considerations 

The discussion thus far has focused on D studies in which each of the 
facets in the universe of generalization is either fixed ti.e., n ' = N' < ") or 
essentially infinite {i.e., n' < N' ~) . It. has also been assumed that G 
study variance components are reported for an infinite universe of admissible 
obsei-vations (i.e., N ^ ") . For most.D studies these assumptions are quite 
reasonable; however, a D study might involve sampling from a finite universe 
of generalization. More specifically, it is possible that, for one or mpre 
facets, n* < N' = N ^ «. For each such facet, the D study uses a 



57 



Generalizability • * 
53 

sample of size n* from a finite universe of generalization of size N" , which 
is identical to the universe size, N, assumed in the G study. 

Fo^ D studies characterized by sampUng fpom a finite universe, a limiting 
case occurs when n' = N* = N < ». In this case, the D study actually includes 
all conditions of the facet in the universe of generalization; and the facet 
is fixed in the universe" of generalization. Another limiting case occurs 
when n' < N' = N ». In «^s case, the D study includes a random sample from 
the (essentially) infinite set of conditions for t.he facet. (This is the definition 
of a random effect in the typical ANOVA sense) . When n' < N" = N < » , it is_ • 
also ass.umed that the sampling of the n' conditions is random, but the universe 
of generalization for .the facet is finite , ' ^ ^ 

- Let us consider the case in which N' = N < » for only one of the facets 
in the universe of generalization, and the D. study involves sampling this facet 

> 

n • < N times. In general, the steps involved in conducting the D study are: 
(a) -use Algorithm 3 to obtain study variance cbmponents which reflect th4 
fact that N < (b) obtain D study variance ccraponents that take" into account 
sampling from a finite universe; and (c)' employ pr6cedures for combining D 
Study variance components, as appropriate. 

Consider, for example, the .design £ x i x o with £ as^the O study object 
of measurement. Let us assume that, in. the universe the iten facet has a 
finite number of conditions, N., which are sampled n! times in 'the D study. ^ 



Since N. < the estimate/T^G'sbttay variance components are obtained using 



Algorithm 3. They are reTX)rted in Table IS for the £ x i x o design. 



Inserf Table 15 about here 



58 



Coneraliz ability 
54 



For any D study componeat, a", the estimated variance of this component is: 



e^(a|N. <»)= . - • 



if nl is in d(alY); otherwise, 



d^(ot|N. < ") 
G '-a 



d^(o|N, < ») = 
£ d(aY) 



(31) 



where d(a|Y) is defined as Equation 13, and" (1 - n|^^) is the finite universe 
correction (see Cochran, 1963, p. 23J associated with variances'for the item 
facet. Table 16 reports the estimaced D study variance components', for the • 
d«:sign £ X i X o when p is the object of measurement.' ' 

. It is im^rtant to note that the D study variance components defined in 
Equations 30 and 31 are for a random sampling model where < " and -v » . 
These variance components are completely analogous to the D study variance 
components for a random effects model reported in the fourth cblumn of Table 8. 
Indeed, for N. " the D stur,- variance components in Table 16 are identical . 
to the D study variance components in Table 8. Also, Equations 17-20 and the 
corresponding notational procedure for combining variance components are 
completely applicable to D study variance components that ihvolve sampling from 
a finite universe. 

' consider, again, Table 16 and suppose that the D study is D(plpl-|r,0) 
, implying that occasions are randomly, sampled from an infinite universe and 
items are randomly sampled from a finite universe of size = N^- For this 
D study, the reader can verify that 



59 . 



Generalizability 

/ 55 ' 



a^{T) = a^(£lN. < od) . 

*= a2(p) + a2(£i)/N^ ; * 

^'^£i(2i)_.+ ^ (pio) ^ , 



and, ^a^(X^ -a2(T) + ^0^(6) 



= a? (£) + 



^ (£i) g^ (go) ( pxv^ / 



n' 
-o 



(32) 



where the variance components without .the conditional <^tateinent "N^ < »'* are 
.the usual random effects D study variance components • 

It is both informative and instructive to note that Equation 32 is identical 
to Equation 22; i.e., *§g^(X) is unchanged by whether or not the universe of 
generalization in^kjlves sampling from a finite universe. This is true for all 
of the possible D fffudies given a particular design and a particular object of 
measurement. 

Consider,, again, VTable 16 and suppose the D study were DCeIeIoI^) with 
occasions fixed^ In thrs case. 




(po) ( pio ) 

- + ^ 



N.n* 



£a^(6) = a2(pl|N^ < CO) + j2Wpxo|N^ < a>) ; 



n: 
—1 



1 - 



(pi)\ (pio) 



Generalizability 
56 

* • • • 

and, ^a2(X) is identified to Equation 32. " 

If the D study jdesign were DCeIeIiI^) ' ^^^^ facet would be 

fixed in this particular D study. In this case, 

a2(T) = a^(£|N. < co) + a2(£i|N. < ~) 

= a2(£) + cr2(Di)/n^ ; • (33) 

and |a2(6) = a^CpojN. < «>) + a2(£io|N. < ~) 



= a2(Do)/n;^ + (pio)/n|n^ . 



(34) 



Equations 33 and 34 are identical to those obtained using the fifth column of 
Table 8. This must be so, because when the item facet is fixed in the universe 
of generalization there is, by definition, no random sampling of the conditions 
of this facet; and the size of the universe has no bearing on a^(T), ^a^(6), 
a^(A), or any quantities formed from them. 

The procedures discussed above can be extended to D studies that involve 
sampling from a finite univ^erse f9r more than 'one facet. In such^^cases, 'estimated 
G study variance components .ere obtained using Algorithm 3, and estimated -D 
study variance components are obtained using a more general version of Equations 
30 and 31. For example, if the D study involves sampling from a finite universe 
for both. the item facet and the occasion facet in the p x ^ x o design, then • 
the finitt universe correction in Equation 30 is: 



61 



Generalizability ^ • 

i 

F7 



(a) (1 - nl/lL) (1 - n'/N ) if dCajy) includes both nT and n! 

(b) (1 - n^/E^) if d(a|Y), includes n| but not n^ ; and 

(c) (1 - n'/N ) if d(a|Y) includes n' but not n* • 



If d(a|Y) includes neither n| nor n^ , then Equation 31 is applicable 



62 



Generalizability 
. 58 ■ 

* , ' Coinnents and Conclusions 

It is usual in both practical and theoretical contexts, to treat issues 
of reliability from a correlational viewpoint. The literature, for example, 
is filled with references to reliability coefficients that estijnate "internal 
conj^istency/' "equivalence/' "stability," etc. While such* coefficients and 
terms have a long and distinguished history,, they can be a source of considerable 
confusion and ambiguity. In particular, it is frequently difficult to identify 
explicitly the magnitudes, types, and sources of error variance incorporated in 
such coefficients. The use of generalizability coefficients can avoid these 
problems, at .least in oart, if the nature of the universe of generalization is 
clearly specified. However, estimated variance components are even more in- 
formative and less ambiguous". Indeed, estimated variance components are the 
most informative outcome of a reliability study (APA, 1974). They can be used 
directly to obtain estimates of universe score variance and different types 
of error variance that are appropriate in different decision-making contexts. 
Variance components can be used, of course, to estimate generalizability' coefficients 
but such coefficients are of questionable value in the absence of the estimated 
variance components themselves. Note that it is the magnitude of variance 
components that is of primary interest — not their statistical significance. 
Also, variance components should not be expressed solely as a percentage or 
proportion of some total score variance. To do so is to obviate the more 
important uses of variance compoiiants. 

Since the magnitudes of variance components are central to generalizability 
theory, it is important that the numerical estimates of variance components 
be as accurate as possible. Therefore, care should be taken to avoid the 
deleterious effects of rounding errors. For example, it is usually advisable 
that most, if not all, calculations involve at leas': three decimal places. 
This is particularly important when a G study ilfvolves binary data, which is the 
usual case for achievement tests. _ 



Generalizabilit:' j,*^..* 
59 

The notational system used in this paper was invented in order to facilitate 

*- 

the statement of various "rules," procedures, and algoritl?ms* There are only 
two principal ways in which this notational system differs from that used by 
Cronbach et al. (1972) • First, this paper uses the nesting, operator to 
designate variance components that involve nesting; Cronbach and his colleagues 
use the "all confounded effects" procedure^ Second, this paper specifies a 
particular D study using the notation D(y|v|f|r). The notation D(y|v|f|r) is 
very useful in specifying rules and procedures for combining D study variance 
components. Also, this notation clearly identifies the universe of generalization, 
and clearly distinguishes, between the object of measurement and the universe of 
■ . generalization. Cronbach et al. (1972) treat object of measurement considerations, 

but they do not emphasize them as much as this paper does. However, Cronbach et al. 
(1972) do clearly identify a fixed facet by concatenating its index with the 
symbol "*" or In terms of certain theoretical expositions the star 

notation has some distinct advantages. 

This paper treats only G studies and D studies that involve orthogonal 
analysis of variance designs; i.e., designs that do not involve missing data 
"and/or unequal size subgroups. The application of genera lizability theory 
to non-orthogonal designs has received little attention in the literature. There 
are, however, two procedures that have been used or suggested for "converting" 
non-orthogonal designs to orthogonal ones. Kane et al. (1976) , for example, 
report randomly discarding data until they had orthogonal designs for their 

c 

studies of student evaluations of teaching. Also, for designs, such as p x (i:s) , 
where the number of items is not a constant for all subtests, Cronbach et al. 
(1965) mention the possibility of using "half -sets" of items within each subtest. 
" • Thuse procedures may not be ideal, but they are at least reasonable alternatives 
until research on variance components in non-orthogonal designs (see Searle, 
1971) is applied to generalizability theory^ ^ 

ER?C 64 



Generalizabiltiy 
6o' 

This paper provides a more detailed consideration of sampling 
from finite universes than is provided in Cronbach et al, (1972), Also, somewhat 
more consideration is given to cferieralizability theory in the context of 
different objects of measurement. However, in other respects this paper is not 
intended to cover, in depth or breadth, the extensive treatment of generalizability 
theory provided by Cronbach and his colleagues, (In particular, multivariate 
generalizability theory has not been treated at all here*) Rather, this paper 
is primarily intended to provide researchers and practitioners jwith a set of 
procedures to facilitate the application of generalizability theory to a broad 
range of measurement problems. It is inadvisable that these procedures be 
used mindlessly; the meaningful interpretation of any statistical analysis 
necessitates a thoughtful and informed consideration of the results* 



65 



Generalizability 

... . ■ ■ _ 61 - • U 

Reference Notes 

* 

Gillmore, G. M. , Kane, M. T*, & Naccaratd; R. W. The teacher and the 
course as units of analysis in the c^eneralizability of student ratings of 
instruction (Report No; 77-9). Seattle, Washington: University of* 
Wisconsin, Educational Assessment; Center, Febniary 1977. 

♦ 

•yhra^, S. K. , & Porter, A. C. Invalidity of a current method toi 
estimating reliability . Paper presented at the annual meeting of'-.the 
National Council on Measurement in Education, Chicago, April 1974. 



66 



Generalizability 
62 



Referencos 



*Abt Associates. Education as Exper imentat ion ; Evaluation of the . Follow Through 
• Planned variation model (2 yolsj'. Cambridge^ Mass,: Abt Associates, 

March 1974^ * . - ' * ^ 

Alexander, W. The estimation of reliability .when several trials are available. 

PsydTometrika , 1947, 12, 79-99. ' * , . , 

-American Psychological Association. Standards for t^ducati'Dnal and psycolocrical 

tests (rev. ed.). Washington, 0.07:' America^ Psychological Association, 1974* 
Brennan, R. L, The calculation of reliability from a split-plot factorial design. 

Educational and Psychological Measurement , 1375, 35.' 779-788. 
Brennan, R. L. , & Kane, M. T, An. index of dependability for mastery tests. 

Journal of Educational Measurement ,, in press: *(a) . 
Bi^ennan,.R. L., & Kane,- M. T-. Signal/noise ratios for domain-referenced tests. 

Psychometrika , in press, (b) . ^ * . 

Burt C. The analysis of examination marks. In P. Hartog S E, C. Rhodes (Eds.), 

The marks of examiners . London: The Macmillan Company, 193^. ^ 
Burt, C. Test reliability estimated by analysis of variance. British Journal 
of Statistical Psychology , 1955, 8, 103-118." 

— " " \ ** ' • 

Cardinet, j/, Tourneur , "y. , s Allal, L. The symmetry of generalizability theory: 

Applications to educational measurement. Journal of Educational Measa^^nTT" 

1976, 13, 119-135. ^ . , . 

Cochran, W. G. Sampling techniques (2nd ed.). New York: Wtley 1963 . 
. Cornfield, J., s Tukey, J. W. Average values of mean squares in factcr^als-. 

Annals of Mathematica l Statistics , 1956, 27, 907-949. ' 
Cronbach, L. J. Coefficient alpha and the internal structure of tests. 

Psychometrika , 1951, 16, 292-334. 



ERIC 



67 



Generalizabillty 

-a * * , 63 ' 

Crot^ch, L, J., Deken, J. E., & Webb, Research on classrooms and schools ; 
Formulation of questions , design , and analysis , Stanford, Californiat , . 

r , . « 'o 

Stanford Evaluation Consortium, 1^76. • 
Cronbach, I.. J., Gleser,. G. C. , Napda, H. , & Rajaratnam, The dependability 

of behavioral measurements : Theory of generalizability for scores, and 

profiles . New "ilorki Wiley,. 1972. • ' ^ 

Cronbach, L. J., Ikeda, M. , & Avner, R. A. Intraclass correlation as an , 

approximation to the coefficient of generalizabi.lity • Psychological 

Reports , 1964, 15^ 727-736. 
Cronbach, J., Rajaratnam/ N. , & Gleser, G. C. Theory of generalizabillty: 

A liberalization of reliability theory. British Journal of Statistical 

Psychology , .1963, 16, 137-163. 
Cronbach, L. J., Schonemann, P., & McKie, T. Alpha coefficients for . 
• ' stratif ied-parallel tests - Educational and Psychological Measurement , 

19^5,' 25, 291-312* ' • 

Dyer, H. S., Linn, R. L. , & Patton, M.^J.. A comparison of four methods of 

obtaining discrepancy measures based on observed and predicted school 

ft 

system means on achievement tests . American Educational Research Journal 
1969, 6, 591-605* 

Ebel, R- L. ^Estimation of t^e reliability of ratings, Psychometrika , ^951, 

16, 407-424. . • * . * 

Endler,. N. S. Estimating variance components from mean squares for random and 
mixed effects analysis of variance models • Perceptual and Motor Skills, 
1966, 22, 559-570. 

Finlayson, S. The reliability of marking essays* British Journal of 
Educational Psychology ,^ 1951, 35 , 143-162* 



&8 



Generalizability 

<• 

64 



- ' \ * : • .... 

Gleser, G. c\, 'cronbach, L. J., S Rajaratnam, N. General izatjility of scores 

\J\ . ■ • ... . ' * .... 

influenced by nitatipl^ sources of variance. Psychometrika , 1965, 30, 395-418. 

. . • . • . • . . . • 

Haney, W. The "dependability of group mean scores. Unpublished special 

qualifying paper. Harvard Graduate School "of Education, October 1974. • (a) 

Haney, W. Units of analysis issues in .the evaluation of pro ject Folj^ Ttoough» ^ 
Cambridge, Mass.:* The Huron institute, September 1974. (b) .... 

Hoyt, C. J. Test reliability estimated by "analysis of variance. Psychometrika , 
1941, 6, 153-160 .V 

Hunter, J. E. Probabilistic foundations for coefficients of generalizabilityf 
Psychometrika , 1968, 33^, 1-18.- ' ■ - ' 

Jackson, R. W.,B., S Ferguson, G. Studies on the reliability of tests. 
University of Toronto, 1941. . ' ' ~ 

Kirk, R. E. Experimental design; Procedures for ^ behavioral sciences. 

Belmont, Calif.: Wadsworth; 1968. <> . ^ 

^ Kuder, G. F., & Richardson, W- The theory of the stimation of test 

, . Of • . 

^ reliability. Psychometrika , 1937, 2j 151-160. ' ^ 

Kane, M. T., & Brennan, R. L. The generalizability of class meap5- .Reyiew of 



Educational Research ^ 1977, 47, 267-292. ' ^ - /^'^^ 

3. M.", & Crooks, T. J. Student /evalur ^-'-^'"^^^^ 
bv of class means. Journal of ^d^c^^' 
1976, 13, 171-183. . 



Kane, M. T. , Gillmore, G. M." , & Crooks, T. J. Student /evaluaiioi(^^1^4crfiing: 

cement J 



The generalizability of class means. Journal of Educational Heasurem^nt, 



Lindquist, E. F. Design and analysis of experiments in psychology a.nd. , , , 

education . Boston: Houghton MiffliD# 1953. 
-lord, F. M. Do tests of the- same length have the sSme standard errot of > 

. measurement? Educational and Psychological Measurement , 1957, 17 ' , 5iO-52lV 
Lord, F. M. Test reliability: . A correction. Educational and Psychological 

Measurement , 1962, 22^, 511-512. ^-\, 



Generalizability 
.65 

Lord, F. M. , & Novick, M. R. Statistical theories of menta l test scores. • 

■ Reading,. Mass. t Addison-Wesley, 1968. . . ' 

Loveland, E. H. Measurement 'of factors affecting test-retest reJ^iability ;,, 

' Unpublished doctoral dissertation. University of Tennessee, 1952. "V-V 
Marco, G. L. A comparison of selected school effectiveness measures- based on 

longitudinal data. Journal of Educational Measurement , 1974, 11, 225-234. 
Maxwell, A. E., & Pilliner, A. E. G. Deriving coefficients of reliability and 
agreement for ratings. British Journal of Mathematical and Statistical 
^ Tsychology i 1968, 21^, 105-116. 

Medley, D- M. , & Mitzei, H. E. Measuring classroom behavior by systematic 

observation* Ink. L. Gage (Ed.), Handbook of research on teaching . 

Chicago, Illinois: Rand McNally, 1963- 
Millman, J. ^ & Glass, g'. V, Rules of thumb for the ANOVA table. Journal of 

Educat ional Measurement , 1967, £, 41-51. 
Page, E. B. Statistically recapturing the richness within the classroom. 

Psychology in the Schools > 1975, 12, 339-344. 

Pilliner, A. E. G. The application' ^of analysis of variance components in 

• * ... 

psychometric experimentation . Unpublished doctoral dissertation, 

. University of Edinburgh, 1965. 
Pilliner, A. E. G., Sutherland, J., & Taylor, E. G. Zero error in moray house 
verbal reasoning tests. British Journal of Educational Psychology , 1960, 
30, 53-62. 

Rajaratnam, N. Reliability formulas for independent decision data when 
reliability data are matched. Psychometrika ,- 1960, 25, 261-271, 

Rajaratnam, N. , Cronbach,^ L. J., & Gleser, G. C. General izability! of stratified 
parallel tests. Psychometrika , 1965, 30, 39-56. 



'70 



\ Generalizability 
X * 66 

Scheffe, H. The analysis of variance ^ New York: Wiley, 1959. j % fj 

Searle, S. R. Linear models .'^ New York: Wiley, 1971. - 
Shaycoft, M.^ F. The statistical characteristics of school means. In J. C. . " "^V 

Flanagan, J. T. Dailey, M. F. Shaycoft, D. B. Orr & I. Goldberg, Studies' \ 

of the American high school . Pittsburgh: University of Pittsburgh, 1962. 
Smith, M. S., & Bissell, J. S.- Report analysis; The impact of Head Start. 

Harvard Educational Review , 1970, 40, 51-104. 
Stallings, J., Wilcox, M. , & Travers, J. Phase 11^ instruments for the national 

day care cost-effects study : Instrument selection and field testing . 
vMenlo Park, Calif.: Stanford Research Institute, January, 1976. 
Vaughn, G. M., & Corba^lis, M. C. Beyond tests of significance: Estimating 

strength of effects in selected ANOVA designs. Psychological Bulletin, 

*1969, 72, 204-213. 

Wiley, D. E. Design and analysis of evaluation studies. In D. E. Wiley & 

M. C- Wittrock (Eds.f, The evaluation of instructidn : Issues and problems. 

Nfew York: Holt, Rinehart, and Winston, 1970. 
Winer, B* J. Statistical principles in experimental design (2nd ed.) . 

New York: McGraw-Hill, 1971. 
Webster, H*. A general iz^ition of Kiider-Richardson reliability formula 21. 

Educational and Psychological Measurement , 1960, 20,' 131-138. 



^ G,enerali2ability 



Footnotes 



' The author would like to thank Dr. Lee J- Cronbach, Dr. Michael T. Kane^ 
Dr, Gerald Gillmcre, ,and Dr, Michael B. Bunch for their many helpful comments 
and suggestions. * 

This research was partially supported by PNR Contract No. N00123-77-C-0739 
between the American College Testing Program and the Navy Personnel Research and 
Development Center. 

A previous version of this paper entitled, "'Rules .of Thumb'" for Generaliza- 
bili^ty Analyses" was presented at the Annual Meeting of. the American Educational 
Research Association, April, 1977. 

^Ic .nay not be obvious that designs like (2.:£) x U^s^tO occur in practice. 
Suppose £ is a school, i^ is an item, £ is a content area or subtest, and £^ is a 
test. Given these verbal identifiers, this design means that each person is 
nested within a single school, each person responds to all items, each item 
is associated with a single cc-ent area or subtest, and each content area is 
associated with a single test. This kind of design very closely approximates 
^he kind of data often collected to assess the reliability of test batteries. 
However, it is rarely the case that the analyses of such data d i s t i ngttxsh-^ tm o n g- 
all potential*- sources of variance. Among other things, this paper is intended 
to aid researchers and practitioners in conceptualizing and performing such 
complex analyses. 

'^For each of the Venn diagrams in Figure 1, a circle is never nested within 
the intersection of two or more circles. This is a geometric indication that, 
for each' of the five illustrative designs, no main effect is nested within an 



7P 



Geheralizability 

60" 

f * 

interaction effect. Consider, however, the design (£:£) x C i^: (£ x o)], in 
which the main effect for items is nested within the interaction of subtests 
and occasions. This main effect would be represented i;so. 



The reader may omit this discussion of s\ims of squares without loss . 
of continuity in the development of g en eraliz ability theory. This section is 
included because the notational system used here provides a convenient 
way to ^express sums of squares for a large class of ANOVA designs. 

^Cronbach et al. (1972)' usually use (£) for universe score variance. 
Here, however, the general use of a^(£) for universe ^score variance could create 
confusion, because objects of measurement other than the person p are treated 
in this paper. 



^Generalizability coefficients have a form that is analogous to £hat of 
traditional reliability coefficients; however, the theoretical basis for 
generalizability coefficients is somewhat more complicated ^and beyond the 
intened scope of this paper. The interested reader can refer to Hunter (1968) 



73« 



Generaiizability * 
69 



/ 



TABLE 1 

Estimated Variarice Components for Design p x .-c 
for Random Effects Model 



Effetrt or Estimated Variance Component 
Component 

n " n - 1 d^{p) = [MS{p) - h\S{pL)l/nj 
p 

, I - 1 ^ ^ . 6^U) = [^^S(>c) - MS(pl)Vn 

A 

(n - iHn. - 1) 6^{pl) = MS(pl) 

P 'C 



Note. K^i = u ■+ Up'o + y^'o + y^^'V' + e . 



-4 , 




74 



TABLE 2 



Estimated^ Variance Components for Design p x ^ x 0 
for Random Effects Model 



Effect or 
Component 



Estimated Variance Component 



n - 1 

P 



n. - 1 



d^il) = CMSU) - MS(p^) - MS{io) + MS(p^o)]/n n 



' p 0 



d^(o) = CMS(o) - ^^S(po) - ^(SKo) + h{S{pio)']/n^n 



po 
lo 



(n_^-l)(n^-l) 



d^(po) = C^^S(po) - MS(p-to)]/n. 



d^Ko) = CiMS(<to) - US {ploy n ^ 



p^o 



(Hp -l)(n^- 1)01^-1) 



d^(p-to) = MS(p^) 



vERJC75 



TABLE 3 



Estimated Variance Components for Design p x 
for Random Effects Model 





Effect or 
Component 




Estimated Variance Component 

- 




P 


n - 1 
P 


d^(p) = CAlS(p) AlS(p4)]/n.n^ 






n (n. - 1) 


d (^•4) = [^1S(^*4) - MS(m.:4)]/n 

P 








2 

0 } = LAO 1^ } — jno('C:^; — AO^p-d; + AO (p^ :4 } J / u ft • 

p 






(n^ - 1) (n - 1) 


d^(p4) = CMS(p4) - A<S(p^:4)]/n. 












Note. 


X.,=u + lJ'^'+ll. '^/+lJ'^' + 
pt:>6 ^p ^4 


^p4 ^p^:4 





77 



78 



Generalizability 

" 72 \ " ^ 



TABLE 4 

Estimated Variance Components for Design (p:C) x 
for Random Effects Model 



Effect or 
Component 




Estimated- Variance Component 


p:C 




d^(p:C) = [MS(p:C) - MS(p^:C)3/n. 


C 


M - 1 


d^(c) = [AlS(c) - MS(p:c) - AtS(cu.) + tAS(pl:C)l/n^n^ 




n- - 1 


e^U) = QIS(^) - AiS (ci) l/n^n^ 


CA. 


(n^ - 1) 


d^(ci) = CA1S((m:) - AtS(p^:C)]/n ' 


p-L-.c ' 




d^(p^:C) = MS(p^:C) 



Note. X* + '\i+u'\'+u»'\'+u.'X'+ii* ' 

^^^^ p:C ^Ct "^p^rC 



c79 



' 'TABLE 5 ■ 
Estimated Vari&nce Components for Design- (p:c) x (A.i6:t) 



\ for Random Effects Model 



Effect pi: 
Component 


H 


Estimated Variance C9mponent 


p:c 


n (n - 1) 
c p 


62(p:c) 




/» 


n " 1 

c 


6^(0 

c 


= CffS(c).- MS(p:c) - MS(c^) + M5(p^:c)]/npH^n^n^ 


l:6tt 






= [A{S(^:4:;£) - A{S(C^:4:^) ]/n n 

p c 




• 


(4:.t) 












t- 


(n - 1) {Yi. - 1) 


6^(c;t) 


Tlf^y^J^v l/0/y«««-A\ liC/l/^■A•/^\ I liO/i/^il*/l '/*\ 1 / l/t b1 

= LMS(Ci) - M^CC/OJ-C) - fioCP-C^C) + M5 (p^* cm:) J/ npU^n^ 






(a6:-t) 


= CMS(C4:^) - /vL^«^t:4:^) - MS(p4:ct) + MS (p/::~4c:;t) l/n^n^ 








= il\S{c.ii&''t) - MS(p^:4c:^)]/n 












C Z p 4 




= CMS p4:ct) - l>{S{pi'.ic:t)'\/n. ^ 




n n n.(n^ - 1) (n. - 1) 

C 4 ^ p -C 


3^ ipli^dt) 


= liS{pli6c:t) gj 



" ' TABLE 6 ■ , 

Components, Mean Scores, and Score Effects for Design {p*C) x ^ 





Con^onent 




» > 
Score Effects in Terms 
, . pf Mean Scores 


f 




. ^v- M^an Scores in Terms 
of Score Effects 




p:c 










p:c 






C 




* > ■* * * . 

• 

' ' ' ' 


J* 


1 




= II + U 'V' 

C 




«^ 




li;'^ - = U y - ii 
<r • * 








= u + u .'^ 








t OA. OA. C . 








= II + II'V'+II-'V'+U 





ni:c. \i . '"^ u • - u - u . + 'J ' u . = + li 'v., + p + p + p . 



TABLE 7 



Sums of Squares for Design (p:C) x 



Component 



Sums o'f Squares 
for Observed Mean Scores 



Sums of Sauares foi Observed Score Effects 



With respect to 
Sums of Squares for 
Observed Mean Scores 



With respect to , 
Observed Score Effects 



[X ] 
p:C 



^ p C 



[X h = Cx, ] - 'CX 1 

p:C p:C e 



p c ' 



n n. ZX^ 
P ^ e ^ 



CX J'v. 
c 



[X ] - I-*! 
c 



It n - Z (X^'v.) ^ 
P ^ c 



(M- 



ex.] 



= n n EX. 
p c ^ ^ 



T/2 



.-•v^r- 



cx^i'v. 



LX^3 - CX] 



CX .]'v, = LX" 



K.j - LX ] - LX.j + L^J 

\ 



n H J:(X.'v.)^ 



' C ^ 



pX:C 



Total 



[X] = n n^nK 
^ p ^ c 



C^..; J- = cx^. .] - Cx^.J - [X,;] + [X,] -= I I J:(X,;.,m' 



Note. For this design with one observation per cell, X^^^^^ is based on only one observation, and, therefore, 



(D / 

h 

tn p. 
N J 

— tr 

H- 
H 
H* 
ft 
*< 



X - X 
p^:C pC:C 



Generalizability 
76 



TABLE 8 





.D Studies for Design p 


X X 0 








With Person 


(P) as the Object 


of Measurement 








Estimated 


D Study 


Estimated 


o 


o 




G Study 


Sauipling 


D Study ' 


1 




O 




Fypcniencv 


Variance 








Components 




Components 




Cl 






1 


. a^(p) 


T 




T 




ft ' 


ev. , 


A 




A 


2 

'r (0) 






A 


A 








d^(pl) 


A, 6 


T 


A, 6 


2 


It 1 




A, 6 


> 

A, 6 


T 




< 0 




A , 


A 


A 






a^(pTO) 


A, 6 


A,<S 


A, 6 



Note. The entries t, 6, and A indicate which estimated D study vayxance 



components enter d'^(T), a^(6), and d'^(A), respectively. 



5 



86 



TABLE 9 



D Studies for Design p x Z 
With Person (P) as the Object of Measurement 



Estimated 
G Study 
Variance 

Components 



D Study 
Samplina 
Frequency 



Estimate*^ 
D Study 
Variance 

Components 



d^(p) 



6^ (pi) 



e^ipJ) 



A, 6 



87 



Generalizability 
•78 



TABLE 10 

D Studies for Design p x (iz6) 
With Perse i (P)*as the Object of Measuranent 









CO 




Estimated 


D Study 


Estimated 




G Studv 


Sainolinq . 


D Studv 


1 




Variance 


Freqoency 


Variance " 




Components 




Components 








1 


d^(p) 


T 


T 






d^T:S) 


L 


A 










% 




L 








d^ipS) 


a , 5 


T 


d'^(pi:6) 




d' (pI:S) 


A, 6 


A, 6 



ERIC 



88 



Generalizability 

7.9 



TABLE 11 

D Studies for Design (p:C) x i 
With Person Nested Within Class (p:C) as 
Object of Measurement 



Ectimated 
G Study 
Variance 

Cv^taponents 


D Study . 
Samplinq 
Frequency 


Estimated 
D Study 
Variance 

Components 


a" 




1 


d^(p,:C) 


T 




1 












A 


2 

^ (CC) 


n 




A 


2 

d (p'i.tC) 




d^(pI:C) 

0 


A, 6 



89 



-0 



Generalizabi lity 
80 



TABLE 12 

D Studies for Design (p:C);x {i:6:t) 
With Person Nested Within Class (p:C) as the 
Object of^ Measurement 



Estimated 
G Study 
Variance 

ComDonents 



D Study 
Sampling 
Frequency 



Estimated 
D Study 
'Variance 

Components 



O 



•CO 

J 



' a^(p:C) 



d (p:C) 



d (i:5:f) 



4 t 



a"{I:S:7) 



0- (t) 



n'.n'ni. 



d^(pT:C) 

.2 



A, 6 



(pS:cT) A,^ 



-t -i. ^ 



ipJ :Sc-J } fi,6 



A, 6 



. 9U 



Generalizability * 
81 



TABLE 13 

D Studies for Design (P:C) x i 
With Class (C) as the Object of Measurement 



Estimated 
G Study 
Variance 

Components 


D Study 

Sampling 
Frequency 


Estimated 
D Study 
Variance 

Components 


1 

o 


_ 

o 
o 


O 

o 






d^(P:C) 


A, 6 


^ T 


' A,6' 




1 




T 


T 


T 








A 


A 






i 






A, 6 


T 


d^{p-t:C) 




d^(PT:C) 


A, 6 


A, 6 


A, 6 



General izability 
82 



TABLE 14 

D Studies for Design (p:C) x (i 6:0) 
With Class (C) as the Object of Measurement 

<*.> 



Estimated 
G Study 
Variance 

Components 


D Study 
Sampling 
Frecjuencv 


Estimated 
D Study 
Variance 

Components 


*k 

CO 
*k 

»-* 
•» 


(X 

o 
o 


d^(p:C) 


P 


d^(P:C) 


A, 6 


A, 6 




1 




T 




:^C:5:f) 




d^(I:S:T) 


A ' 


A 


<» 




d^(S:T) 


A 




2 




d''^ (T) 


A 




d^(cf) 


*'> 


2 

dUcT) 


, A, 6 


T 




i r 


d^(cS:T) 


A, 6 


X 


2 

^ ( C. ' : ^ : ^ ) 




d^Cl:S:T) 


A,& 


6 




- "p't 




A, 6 


A,iS 






d^(PS:CT) 


A, 6 


A, 6 


d^(».H:iC: ^) 


p < i ^ 


:Sc •''^) 


A X 


A A 



9?. 



TABLE 15 



St-udy and D Study Variance Components 
for the Desian p x k x 0, with Person (p) as the Object of Measurement 
wh'jn Items are Sampled from a Finite Universe {hi j = hl\ < oc) 



' Estimated G Study ^ 
Variance Comoonento 
for Randoin Sampling 


"D Sti'idy 
Sampling 
Frequency 


Finite 
Universe 
Correction 


Estimated. D Study 
Varaince Components . 
for Random Sampling 






1 


1 


d^(p|W; < 00) 


= d^(p) + d^(p^)/M'. 

'V 




= d^U) 




(1 - n)/U.) 




= (1 - n;./N..)d^(^)/n) 


d^(o|M. < ") 






1 


d^(0|w^ < ") - 


= d^(o)/n' + d^(^o)A'.n' 








(1 - n\/H ; 


6^(pliW. < ") 


= (1 - nWN.)d^{pi}/n'. 


d (po\N. < ") 




/ 

c 


1 

If 




= d^.(po)/n^ + d^{pio)/MUi^ 

t i 




= d^(^) . 








a n}/M.)d^Uo)/n'.n' 






A. 0 


' (1 - 




= (1 - n)/U )d"(pix))/n)n* 



94 

9'i 



Generalizability 
84 



Figure 'Captions 

Figure 1. Venn diagrams for five illustrative designs. 

Figure 2. Decomposition of three variance components, for the random 
effects model, in terms of mean squares for the design (p:c) x i_. 



J' 



